Power-Softmax: Towards Secure LLM Inference over Encrypted Data
Pith reviewed 2026-05-23 18:44 UTC · model grok-4.3
The pith
A new Power-Softmax attention variant enables stable training of billion-parameter polynomial LLMs for homomorphic encryption while preserving reasoning performance.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that Power-Softmax provides a stable training form for self-attention that is easy to approximate with polynomials, enabling the first polynomial LLMs over a billion parameters with reasoning and ICL capabilities comparable to standard transformers of the same size.
What carries the argument
Power-Softmax, a polynomial-friendly variant of the softmax function in self-attention that replaces the exponential with a power-based form for stability and approximability under encryption.
If this is right
- Secure inference becomes feasible for LLMs at billion-parameter scale using homomorphic encryption.
- Models using Power-Softmax can achieve performance parity with standard transformers on reasoning tasks.
- Latency breakdowns for encrypted computations can guide further optimizations in privacy-preserving systems.
- Inductive biases differ between Power-Softmax models and standard transformers, which may affect specific task performances.
Where Pith is reading between the lines
- Deploying such models could allow private AI services without exposing user data to the model owner.
- Further work might explore combining Power-Softmax with other polynomial approximations for layer normalization to create fully polynomial transformers.
- Testing these models on a wider range of benchmarks could reveal where the inductive bias differences matter most.
Load-bearing premise
The Power-Softmax attention can be trained stably at billion-parameter scale and its polynomial approximation preserves sufficient inductive bias to match standard transformer performance.
What would settle it
Training a billion-parameter model with Power-Softmax and finding that its polynomial version underperforms standard transformers significantly on in-context learning benchmarks would challenge the central claim.
Figures
read the original abstract
Modern cryptographic methods for implementing privacy-preserving LLMs such as \gls{HE} require the LLMs to have a polynomial form. Forming such a representation is challenging because transformers include non-polynomial components, such as \Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., \Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs over a billion parameters, exceeding the size of previous models by more than tenfold. The resulting models demonstrate reasoning and in-context learning (ICL) capabilities comparable to standard transformers of the same size, representing a breakthrough in the field. Finally, we provide a detailed latency breakdown for each computation over encrypted data, paving the way for further optimization, and explore the differences in inductive bias between models relying on our HE-friendly variant and standard transformers.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Power-Softmax, a new self-attention variant intended to be stable for training and amenable to low-degree polynomial approximation, enabling homomorphic-encryption (HE) friendly LLMs. It claims the first such models exceeding one billion parameters (more than 10x prior work), with reasoning and in-context learning performance comparable to standard transformers of the same size, plus a latency breakdown for encrypted inference.
Significance. If the performance and stability claims hold, the result would be a substantial advance for privacy-preserving inference, as it would demonstrate that polynomial LLMs can be scaled to practical sizes while retaining core capabilities.
major comments (2)
- [Abstract] Abstract: the central claim that the models exceed prior work by more than tenfold and achieve 'comparable' reasoning/ICL performance supplies no model sizes, benchmark scores, training hyperparameters, polynomial degrees, or approximation-error metrics; without these the 'first' and 'comparable' assertions cannot be evaluated.
- [Abstract (and results sections)] The weakest assumption (training stability of Power-Softmax and preservation of inductive bias under polynomial approximation at >1B parameters) is asserted but not supported by any derivation, ablation, or scaling experiment in the provided text; if either fails the headline result collapses.
minor comments (1)
- [Abstract] Abstract: 'Power-Softmax' is named without an equation or definition; a brief functional form would improve readability.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive suggestions. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the models exceed prior work by more than tenfold and achieve 'comparable' reasoning/ICL performance supplies no model sizes, benchmark scores, training hyperparameters, polynomial degrees, or approximation-error metrics; without these the 'first' and 'comparable' assertions cannot be evaluated.
Authors: We agree that the abstract would be clearer with explicit quantitative details. The full manuscript reports model sizes (1.3B parameters), benchmark scores on reasoning and ICL tasks, training hyperparameters, polynomial degrees used, and approximation errors in the results and experimental sections. In revision we will expand the abstract to include these key figures (e.g., exact parameter counts, selected benchmark accuracies, and degree values) while preserving brevity. revision: yes
-
Referee: [Abstract (and results sections)] The weakest assumption (training stability of Power-Softmax and preservation of inductive bias under polynomial approximation at >1B parameters) is asserted but not supported by any derivation, ablation, or scaling experiment in the provided text; if either fails the headline result collapses.
Authors: The results section presents training curves, loss stability across scales, and direct performance comparisons between Power-Softmax models and standard transformers at >1B parameters, which empirically support both stability and retention of capabilities under the polynomial approximation. However, we acknowledge that dedicated ablations isolating the effect of polynomial degree on inductive bias at this scale would strengthen the claim. We will add such targeted ablations and scaling plots in the revised manuscript. revision: partial
Circularity Check
No circularity in derivation chain
full rationale
The paper introduces Power-Softmax as a new attention variant and asserts empirical outcomes (first >1B-parameter polynomial LLMs with comparable reasoning/ICL). The provided abstract and description contain no equations, fitted parameters renamed as predictions, self-citations invoked as uniqueness theorems, or ansatzes smuggled via prior work. All load-bearing claims are external empirical assertions about training stability and approximation fidelity at scale; these are falsifiable outside the paper rather than reducing to inputs by construction. This is the expected non-finding for an empirical methods paper.
Axiom & Free-Parameter Ledger
invented entities (1)
-
Power-Softmax
no independent evidence
Reference graph
Works this paper leans on
-
[1]
HElayers: A tile tensors framework for large neural networks on encrypted data
Ehud Aharoni, Allon Adir, Moran Baruch, Nir Drucker, Gilad Ezov, Ariel Farkash, Lev Greenberg, Ramy Masalha, Guy Moshkowich, Dov Murik, et al. HElayers: A tile tensors framework for large neural networks on encrypted data . PoPETs, 2023. doi:10.56553/popets-2023-0020
-
[2]
On the privacy of protocols based on cpa-secure homomorphic encryption
Adi Akavia and Margarita Vald. On the privacy of protocols based on cpa-secure homomorphic encryption. IACR Cryptol. ePrint Arch. , 2021: 0 803, 2021. URL https://eprint.iacr.org/2021/803
work page 2021
-
[3]
Gpt-neox: Large scale autoregressive language modeling in pytorch, 9 2023
Alex Andonian, Quentin Anthony, Stella Biderman, Sid Black, Preetham Gali, Leo Gao, Eric Hallahan, Josh Levy-Kramer, Connor Leahy, Lucas Nestler, Kip Parker, Michael Pieler, Jason Phang, Shivanshu Purohit, Hailey Schoelkopf, Dashiell Stander, Tri Songz, Curt Tigges, Benjamin Thérien, Phil Wang, and Samuel Weinbach. Gpt-neox: Large scale autoregressive lan...
work page 2023
-
[4]
AutoFHE : Automated adaption of CNNs for efficient evaluation over FHE
Wei Ao and Vishnu Naresh Boddeti. AutoFHE : Automated adaption of CNNs for efficient evaluation over FHE . In 33rd USENIX Security Symposium (USENIX Security 24), pp.\ 2173--2190, Philadelphia, PA, August 2024. USENIX Association. ISBN 978-1-939133-44-1. URL https://www.usenix.org/conference/usenixsecurity24/presentation/ao
work page 2024
-
[5]
A Methodology for Training Homomorphic Encryption Friendly Neural Networks
Moran Baruch, Nir Drucker, Lev Greenberg, and Guy Moshkowich. A Methodology for Training Homomorphic Encryption Friendly Neural Networks . In Applied Cryptography and Network Security Workshops, pp.\ 536--553, Cham, 2022. Springer International Publishing. ISBN 978-3-031-16815-4. doi:10.1007/978-3-031-16815-4\_29
-
[6]
Sensitive Tuning of Large Scale CNNs for E2E Secure Prediction using Homomorphic Encryption
Moran Baruch, Nir Drucker, Gilad Ezov, Eyal Kushnir, Jenny Lerner, Omri Soceanu, and Itamar Zimerman. Sensitive Tuning of Large Scale CNNs for E2E Secure Prediction using Homomorphic Encryption . arXiv preprint arXiv:2304.14836, 2023. URL https://arxiv.org/pdf/2304.14836. To appear in CSCML 2024
-
[7]
Pythia : A suite for analyzing large language models across training and scaling
Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, Usvsn Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar Van Der Wal. Pythia : A suite for analyzing large language models across training and scaling. In Andreas Krause, Emma Brunskill, Kyun...
work page 2023
-
[8]
(Leveled) Fully Homomorphic Encryption without Bootstrapping
Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) Fully Homomorphic Encryption without Bootstrapping . ACM Trans. Comput. Theory, 6 0 (3), July 2014. ISSN 1942-3454. doi:10.1145/2633600
-
[9]
The-x: Privacy-preserving transformer inference with homomorphic encryption
Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, and Furu Wei. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216, 2022. URL https://arxiv.org/abs/2206.00216
-
[10]
Homomorphic encryption for arithmetic of approximate numbers
Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. In International Conference on the Theory and Application of Cryptology and Information Security, pp.\ 409--437. Springer, 2017. doi:10.1007/978-3-319-70694-8\_15
-
[11]
P-nets: Deep polynomial neural networks
Grigorios G Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Yannis Panagakis, Jiankang Deng, and Stefanos Zafeiriou. P-nets: Deep polynomial neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7325--7335, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Chrysos_P-nets_Deep_Polynomial_...
work page 2020
-
[12]
East: Efficient and accurate secure transformer framework for inference
Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu Guan, and Xiyong Zhang. East: Efficient and accurate secure transformer framework for inference. arXiv preprint arXiv:2308.09923, 2023. URL https://arxiv.org/abs/2308.09923
-
[13]
Efficient skip connections realization for secure inference on encrypted data
Nir Drucker and Itamar Zimerman. Efficient skip connections realization for secure inference on encrypted data. In Shlomi Dolev, Ehud Gudes, and Pascal Paillier (eds.), Cyber Security, Cryptology, and Machine Learning, pp.\ 65--73, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-34671-2. doi:10.1007/978-3-031-34671-2_5
-
[14]
Somewhat Practical Fully Homomorphic Encryption
Junfeng Fan and Frederik Vercauteren. Somewhat Practical Fully Homomorphic Encryption . Proceedings of the 15th international conference on Practice and Theory in Public Key Cryptography, pp.\ 1--16, 2012. URL https://eprint.iacr.org/2012/144
work page 2012
-
[15]
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020. URL https://arxiv.org/abs/2101.00027
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[16]
A fully homomorphic encryption scheme
Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, Palo Alto, CA, 2009. URL https://crypto.stanford.edu/craig/craig-thesis.pdf
work page 2009
-
[17]
Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy
Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International conference on machine learning, pp.\ 201--210. PMLR, 2016. URL http://proceedings.mlr.press/v48/gilad-bachrach16.pdf
work page 2016
-
[18]
Aaron Gokaslan and Vanya Cohen. Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus, 2019
work page 2019
-
[19]
Applications of division by convergence
Robert E Goldschmidt. Applications of division by convergence. PhD thesis, Massachusetts Institute of Technology, 1964. URL https://dspace.mit.edu/bitstream/handle/1721.1/11113/34136725-MIT.pdf
work page 1964
-
[20]
Polynomial activation functions
Vikas Gottemukkula. Polynomial activation functions. OpenReview, 2020. URL https://openreview.net/forum?id=rkxsgkHKvH
work page 2020
-
[21]
Bayesian neural networks uncertainty quantification with cubature rules
Mohit Goyal, Rajan Goyal, and Brejesh Lall. Improved polynomial neural networks with normalised activations. In 2020 International Joint Conference on Neural Networks (IJCNN), pp.\ 1--8. IEEE, 2020. doi:10.1109/IJCNN48605.2020.9207535
-
[22]
SIGMA : Secure GPT inference with function secret sharing
Kanav Gupta, Neha Jawalkar, Ananta Mukherjee, Nishanth Chandran, Divya Gupta, Ashish Panwar, and Rahul Sharma. SIGMA : Secure GPT inference with function secret sharing. Cryptology ePrint Archive, 2023. URL https://eprint.iacr.org/2023/1269
work page 2023
-
[23]
Neujeans: Private neural network inference with joint optimization of convolution and bootstrapping
Jae Hyung Ju, Jaiyoung Park, Jongmin Kim, Donghwan Kim, and Jung Ho Ahn. Neujeans: Private neural network inference with joint optimization of convolution and bootstrapping. arXiv preprint arXiv:2312.04356, 2023. URL https://arxiv.org/abs/2312.04356
-
[24]
Eunsang Lee, Joon-Woo Lee, Junghyun Lee, Young-Sik Kim, Yongjune Kim, Jong-Seon No, and Woosuk Choi. Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th Intern...
work page 2022
-
[25]
Precise approximation of convolutional neural networks for homomorphically encrypted data
Junghyun Lee, Eunsang Lee, Joon-Woo Lee, Yongjune Kim, Young-Sik Kim, and Jong-Seon No. Precise approximation of convolutional neural networks for homomorphically encrypted data. arXiv preprint arXiv:2105.10879, 2021. URL https://arxiv.org/abs/2105.10879
-
[26]
Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,
Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, and Jong-Seon No. Optimizing layerwise polynomial approximation for efficient private inference on fully homomorphic encryption: A dynamic programming approach. arXiv preprint arXiv:2310.10349, 2023. URL https://arxiv.org/abs/2310.10349
-
[27]
MERGE : Fast private text generation
Zi Liang, Pinghui Wang, Ruofei Zhang, Nuo Xu, Shuo Zhang, Lifeng Xing, Haitao Bai, and Ziyang Zhou. MERGE : Fast private text generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38 0 (18): 0 19884--19892, Mar. 2024. doi:10.1609/aaai.v38i18.29964
-
[28]
Llms can understand encrypted prompt: Towards privacy-computing friendly transformers
Xuanqi Liu and Zhuotao Liu. LLMs can understand encrypted prompt: Towards privacy-computing friendly transformers. arXiv preprint arXiv:2305.18396, 2023. URL https://arxiv.org/abs/2305.18396
-
[29]
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. URL https://arxiv.org/abs/1907.11692
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[30]
Financial news classification dataset
Nicholas Muchinguri. Financial news classification dataset. https://huggingface.co/datasets/nickmuchi/financial-classification, 2022. Accessed: 2024-05-26
work page 2022
-
[31]
fairseq: A fast, extensible toolkit for sequence modeling
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations, 2019
work page 2019
-
[32]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017. URL https://arxiv.org/abs/1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[33]
Analyzing the structure of attention in a transformer language model
Jesse Vig and Yonatan Belinkov. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP , pp.\ 63--76, Florence, Italy, August 2019. Association for Computational Linguistics. doi:10.18653/v1/W19-4808. URL https://aclanthology.org/W19-4808
-
[34]
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Wang. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[35]
On protecting the data privacy of large language models (llms): A survey,
Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzheng Cheng. On protecting the data privacy of large language models ( LLMs ): A survey. arXiv preprint arXiv:2403.05156, 2024. URL https://arxiv.org/abs/2403.05156
-
[36]
Energy -Aware Proof-of-Authority: Blockchain Consensus for Clustered Wireless Sensor Network
Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4 0 (2): 0 100211, 2024. ISSN 2667-2952. doi:https://doi.org/10.1016/j.hcc.2024.100211
-
[37]
Chi Zhang, Man Ho Au, and Siu Ming Yiu. Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement. arXiv preprint arXiv:2402.11224, 2024 a . URL https://arxiv.org/abs/2402.11224
-
[38]
Secure transformer inference made non-interactive
Jiawen Zhang, Jian Liu, Xinpeng Yang, Yinghao Wang, Kejia Chen, Xiaoyang Hou, Kui Ren, and Xiaohu Yang. Secure transformer inference made non-interactive. Cryptology ePrint Archive, 2024 b . URL https://eprint.iacr.org/2024/136
work page 2024
-
[39]
Primer: Fast private transformer inference on encrypted data
Mengxin Zheng, Qian Lou, and Lei Jiang. Primer: Fast private transformer inference on encrypted data. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pp.\ 1--6, 2023. doi:10.1109/DAC56929.2023.10247719
-
[40]
Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training
Jun Zhou, Huimin Qian, Xinbiao Lu, Zhaoxia Duan, Haoqian Huang, and Zhen Shao. Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training. Neurocomputing, 359: 0 227--240, 2019. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2019.06.004
-
[41]
Converting transformers to polynomial form for secure inference over homomorphic encryption
Itamar Zimerman, Moran Baruch, Nir Drucker, Gilad Ezov, Omri Soceanu, and Lior Wolf. Converting transformers to polynomial form for secure inference over homomorphic encryption. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (eds.), Proceedings of the 41st International Conferen...
work page 2024
-
[42]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...
-
[43]
\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...
-
[44]
\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...
-
[45]
@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.