Power-Softmax: Towards Secure LLM Inference over Encrypted Data

Allon Adir; Ehud Aharoni; Itamar Zimerman; Jenny Lerner; Matan Avitan; Moran Baruch; Nir Drucker; Omri Soceanu; Ramy Masalha; Reut Meiri

arxiv: 2410.09457 · v2 · submitted 2024-10-12 · 💻 cs.LG · cs.CR

Power-Softmax: Towards Secure LLM Inference over Encrypted Data

Itamar Zimerman , Allon Adir , Ehud Aharoni , Matan Avitan , Moran Baruch , Nir Drucker , Jenny Lerner , Ramy Masalha

show 2 more authors

Reut Meiri Omri Soceanu

This is my paper

Pith reviewed 2026-05-23 18:44 UTC · model grok-4.3

classification 💻 cs.LG cs.CR

keywords Power-Softmaxhomomorphic encryptionsecure LLM inferencepolynomial transformersencrypted datain-context learningtransformer variants

0 comments

The pith

A new Power-Softmax attention variant enables stable training of billion-parameter polynomial LLMs for homomorphic encryption while preserving reasoning performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Power-Softmax as a replacement for the standard softmax in transformer attention layers. This variant is designed to be polynomial, making it compatible with homomorphic encryption for private inference. Previous methods either approximated existing models inefficiently or used simpler but less scalable replacements. By using Power-Softmax, the authors train models exceeding a billion parameters that match standard transformers on reasoning and in-context learning tasks. This advances privacy-preserving LLMs by allowing much larger models than before.

Core claim

The central discovery is that Power-Softmax provides a stable training form for self-attention that is easy to approximate with polynomials, enabling the first polynomial LLMs over a billion parameters with reasoning and ICL capabilities comparable to standard transformers of the same size.

What carries the argument

Power-Softmax, a polynomial-friendly variant of the softmax function in self-attention that replaces the exponential with a power-based form for stability and approximability under encryption.

If this is right

Secure inference becomes feasible for LLMs at billion-parameter scale using homomorphic encryption.
Models using Power-Softmax can achieve performance parity with standard transformers on reasoning tasks.
Latency breakdowns for encrypted computations can guide further optimizations in privacy-preserving systems.
Inductive biases differ between Power-Softmax models and standard transformers, which may affect specific task performances.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deploying such models could allow private AI services without exposing user data to the model owner.
Further work might explore combining Power-Softmax with other polynomial approximations for layer normalization to create fully polynomial transformers.
Testing these models on a wider range of benchmarks could reveal where the inductive bias differences matter most.

Load-bearing premise

The Power-Softmax attention can be trained stably at billion-parameter scale and its polynomial approximation preserves sufficient inductive bias to match standard transformer performance.

What would settle it

Training a billion-parameter model with Power-Softmax and finding that its polynomial version underperforms standard transformers significantly on in-context learning benchmarks would challenge the central claim.

Figures

Figures reproduced from arXiv: 2410.09457 by Allon Adir, Ehud Aharoni, Itamar Zimerman, Jenny Lerner, Matan Avitan, Moran Baruch, Nir Drucker, Omri Soceanu, Ramy Masalha, Reut Meiri.

**Figure 1.** Figure 1: Comparison of Softmax and PowerSoftmax normalization on normally distributed values on the left, uniformly distributed values in the middle, and evenly spaced values on the right. As can be seen, the empirical scaling trends are relatively similar. 4.1 HE-FRIENDLY ATTENTION To design a HE-friendly variant of Softmax-based attention, we start by distilling its properties that correlate with its performance:… view at source ↗

**Figure 2.** Figure 2: (middle) illustrates our HE-friendly training variant, built on top of Eqs. 4 and 5, compared to the original attention [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: , measured using HElayers 1.5.4 Aharoni et al. (2023) configured for CKKS with 128-bit security and poly-degree of 2 16. Here, matrix multiplication took 49% + 18% = 67% out of which most of it was spent on encoding the plaintext weights. Polynomial approximation accounted for 14% + 6% + 4% = 24% of the total time, where PowerSoftmax took 6% of it. Interestingly, in all polynomial approximations, the most … view at source ↗

**Figure 4.** Figure 4: Training Curves for NTP: Comparison of test perplexity for transformers with Softmax and power normalization when trained over several datasets including Pile, Wikitext-103, and Text8. 5.2 JUSTIFY DESIGN CHOICES To justify our design choices, we conduct a series of ablations. Power-Softmax Attention. We first compare PowerSoftmax and Softmax outside the context of HE, showing that in addition to being a H… view at source ↗

**Figure 5.** Figure 5: Results On Vision Tasks. Training curves for ViT Variants with PowerSoftmax (red) and the Softmax baseline (blue). On the left, results are presented for Tiny-ImageNet and on the middle and right for CIFAR-100 and CIFAR-10 accordingly [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: The Significance of the Stable Variant. Training curves for NTP on Wikitext for large models .The stable variant (red) consistently outperforms the vanilla PowerSoftmax (blue). Stability. To assess the contribution of our numerically stable variant, we conduct dedicated experiments. In [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Measuring the polynomial approximation error for different values of ϵ. ϵ-Bounded Division for Softmax. The HEfriendly attention variant from Eq. 4 proposes adding epsilon to make the approximation problem of division easier, resulting in an approximation of a 1 ϵ 2 - Lipschitz continuous function [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Measuring the attention mean distance for different transformer variants. PowerSoftmax introduces an important hyperparameter p that differentiates it from the traditional Softmax function. To better understand its mechanistic behavior, we examine how the attention matrices evolve with varying values of p. Our analysis reveals that as p increases, the resulting attention matrices become more localized as… view at source ↗

**Figure 9.** Figure 9: Visualisation of Averaged Attention Matrices: Layer Index\Model, where models from left to right are PowerSoftmax with p = 4, 8, 12 and Softmax 10 [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Visualisation of polynomial average attention matrices: Models with P = 4 (first column) generate more local attention matrices, with reduced mass near the diagonal compared to models with P = 8 or P = 12, particularly in layers 4-10. In all models, the final layers (rows at the bottom) display more global attention patterns than the middle layers. 15 [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

**Figure 11.** Figure 11: Visualisation of random samples of polynomial attention matrices: Although the attention matrices are noisy and a small number of samples may not capture the full distribution trend, the Power-softmax-based models (first three columns) show behavior similar to the original Softmax (last column). Notably, our attention layers can dynamically adjust focus across different parts of the input, allowing attent… view at source ↗

**Figure 12.** Figure 12: Comparison of training curves for 12-layer RoBERTa models with different attention [PITH_FULL_IMAGE:figures/full_fig_p018_12.png] view at source ↗

**Figure 13.** Figure 13: The impact of different values of ϵ on training dynamics of PowerSoftmax-based models 18 [PITH_FULL_IMAGE:figures/full_fig_p018_13.png] view at source ↗

read the original abstract

Modern cryptographic methods for implementing privacy-preserving LLMs such as \gls{HE} require the LLMs to have a polynomial form. Forming such a representation is challenging because transformers include non-polynomial components, such as \Softmax and layer normalization. Previous approaches have either directly approximated pre-trained models with large-degree polynomials, which are less efficient over HE, or replaced non-polynomial components with easier-to-approximate primitives before training, e.g., \Softmax with pointwise attention. The latter approach might introduce scalability challenges. We present a new HE-friendly variant of self-attention that offers a stable form for training and is easy to approximate with polynomials for secure inference. Our work introduces the first polynomial LLMs over a billion parameters, exceeding the size of previous models by more than tenfold. The resulting models demonstrate reasoning and in-context learning (ICL) capabilities comparable to standard transformers of the same size, representing a breakthrough in the field. Finally, we provide a detailed latency breakdown for each computation over encrypted data, paving the way for further optimization, and explore the differences in inductive bias between models relying on our HE-friendly variant and standard transformers.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Power-Softmax claims to enable the first billion-parameter polynomial LLMs for encrypted inference but the abstract supplies no numbers or details to support the performance claims.

read the letter

Power-Softmax is a new variant of self-attention designed to be stable for training and straightforward to approximate with low-degree polynomials. This lets the authors claim the first polynomial LLMs larger than a billion parameters that still show reasoning and in-context learning performance close to ordinary transformers. The construction is new in how it modifies attention to fit both training and homomorphic encryption constraints. Earlier work either approximated full models after training, which requires high degrees, or swapped in pointwise functions that may not scale well. They also include a breakdown of latency for each part of the computation under encryption, which is useful for seeing where the costs are. The soft spot is clear from the abstract: it states the models exist and perform comparably but gives none of the numbers, training details, approximation degrees, or error bounds needed to check the claim. The assumption that the variant trains stably at that scale and that the approximation keeps the right inductive bias is not backed up in the text provided. That is the load-bearing part. This work is aimed at researchers in privacy-preserving machine learning who want to push encrypted inference beyond small models. A reader focused on practical deployment in regulated settings would find the latency analysis and the new primitive relevant. The paper engages with the real technical barriers in the area. I would send this to peer review. The topic matters and the direction is distinct, so referees should check whether the experiments hold up.

Referee Report

2 major / 1 minor

Summary. The paper introduces Power-Softmax, a new self-attention variant intended to be stable for training and amenable to low-degree polynomial approximation, enabling homomorphic-encryption (HE) friendly LLMs. It claims the first such models exceeding one billion parameters (more than 10x prior work), with reasoning and in-context learning performance comparable to standard transformers of the same size, plus a latency breakdown for encrypted inference.

Significance. If the performance and stability claims hold, the result would be a substantial advance for privacy-preserving inference, as it would demonstrate that polynomial LLMs can be scaled to practical sizes while retaining core capabilities.

major comments (2)

[Abstract] Abstract: the central claim that the models exceed prior work by more than tenfold and achieve 'comparable' reasoning/ICL performance supplies no model sizes, benchmark scores, training hyperparameters, polynomial degrees, or approximation-error metrics; without these the 'first' and 'comparable' assertions cannot be evaluated.
[Abstract (and results sections)] The weakest assumption (training stability of Power-Softmax and preservation of inductive bias under polynomial approximation at >1B parameters) is asserted but not supported by any derivation, ablation, or scaling experiment in the provided text; if either fails the headline result collapses.

minor comments (1)

[Abstract] Abstract: 'Power-Softmax' is named without an equation or definition; a brief functional form would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive suggestions. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the models exceed prior work by more than tenfold and achieve 'comparable' reasoning/ICL performance supplies no model sizes, benchmark scores, training hyperparameters, polynomial degrees, or approximation-error metrics; without these the 'first' and 'comparable' assertions cannot be evaluated.

Authors: We agree that the abstract would be clearer with explicit quantitative details. The full manuscript reports model sizes (1.3B parameters), benchmark scores on reasoning and ICL tasks, training hyperparameters, polynomial degrees used, and approximation errors in the results and experimental sections. In revision we will expand the abstract to include these key figures (e.g., exact parameter counts, selected benchmark accuracies, and degree values) while preserving brevity. revision: yes
Referee: [Abstract (and results sections)] The weakest assumption (training stability of Power-Softmax and preservation of inductive bias under polynomial approximation at >1B parameters) is asserted but not supported by any derivation, ablation, or scaling experiment in the provided text; if either fails the headline result collapses.

Authors: The results section presents training curves, loss stability across scales, and direct performance comparisons between Power-Softmax models and standard transformers at >1B parameters, which empirically support both stability and retention of capabilities under the polynomial approximation. However, we acknowledge that dedicated ablations isolating the effect of polynomial degree on inductive bias at this scale would strengthen the claim. We will add such targeted ablations and scaling plots in the revised manuscript. revision: partial

Circularity Check

0 steps flagged

No circularity in derivation chain

full rationale

The paper introduces Power-Softmax as a new attention variant and asserts empirical outcomes (first >1B-parameter polynomial LLMs with comparable reasoning/ICL). The provided abstract and description contain no equations, fitted parameters renamed as predictions, self-citations invoked as uniqueness theorems, or ansatzes smuggled via prior work. All load-bearing claims are external empirical assertions about training stability and approximation fidelity at scale; these are falsifiable outside the paper rather than reducing to inputs by construction. This is the expected non-finding for an empirical methods paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The abstract introduces Power-Softmax as a new attention primitive without specifying its exact functional form or any fitted constants. No free parameters, additional axioms, or invented entities beyond the new primitive itself are mentioned.

invented entities (1)

Power-Softmax no independent evidence
purpose: HE-friendly replacement for softmax in self-attention that remains stable for training and admits low-degree polynomial approximation
Introduced in the abstract as the core technical contribution enabling the billion-parameter models.

pith-pipeline@v0.9.0 · 5769 in / 1299 out tokens · 19831 ms · 2026-05-23T18:44:52.131762+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 4 internal anchors

[1]

HElayers: A tile tensors framework for large neural networks on encrypted data

Ehud Aharoni, Allon Adir, Moran Baruch, Nir Drucker, Gilad Ezov, Ariel Farkash, Lev Greenberg, Ramy Masalha, Guy Moshkowich, Dov Murik, et al. HElayers: A tile tensors framework for large neural networks on encrypted data . PoPETs, 2023. doi:10.56553/popets-2023-0020

work page doi:10.56553/popets-2023-0020 2023
[2]

On the privacy of protocols based on cpa-secure homomorphic encryption

Adi Akavia and Margarita Vald. On the privacy of protocols based on cpa-secure homomorphic encryption. IACR Cryptol. ePrint Arch. , 2021: 0 803, 2021. URL https://eprint.iacr.org/2021/803

work page 2021
[3]

Gpt-neox: Large scale autoregressive language modeling in pytorch, 9 2023

Alex Andonian, Quentin Anthony, Stella Biderman, Sid Black, Preetham Gali, Leo Gao, Eric Hallahan, Josh Levy-Kramer, Connor Leahy, Lucas Nestler, Kip Parker, Michael Pieler, Jason Phang, Shivanshu Purohit, Hailey Schoelkopf, Dashiell Stander, Tri Songz, Curt Tigges, Benjamin Thérien, Phil Wang, and Samuel Weinbach. Gpt-neox: Large scale autoregressive lan...

work page 2023
[4]

AutoFHE : Automated adaption of CNNs for efficient evaluation over FHE

Wei Ao and Vishnu Naresh Boddeti. AutoFHE : Automated adaption of CNNs for efficient evaluation over FHE . In 33rd USENIX Security Symposium (USENIX Security 24), pp.\ 2173--2190, Philadelphia, PA, August 2024. USENIX Association. ISBN 978-1-939133-44-1. URL https://www.usenix.org/conference/usenixsecurity24/presentation/ao

work page 2024
[5]

A Methodology for Training Homomorphic Encryption Friendly Neural Networks

Moran Baruch, Nir Drucker, Lev Greenberg, and Guy Moshkowich. A Methodology for Training Homomorphic Encryption Friendly Neural Networks . In Applied Cryptography and Network Security Workshops, pp.\ 536--553, Cham, 2022. Springer International Publishing. ISBN 978-3-031-16815-4. doi:10.1007/978-3-031-16815-4\_29

work page doi:10.1007/978-3-031-16815-4 2022
[6]

Sensitive Tuning of Large Scale CNNs for E2E Secure Prediction using Homomorphic Encryption

Moran Baruch, Nir Drucker, Gilad Ezov, Eyal Kushnir, Jenny Lerner, Omri Soceanu, and Itamar Zimerman. Sensitive Tuning of Large Scale CNNs for E2E Secure Prediction using Homomorphic Encryption . arXiv preprint arXiv:2304.14836, 2023. URL https://arxiv.org/pdf/2304.14836. To appear in CSCML 2024

work page arXiv 2023
[7]

Pythia : A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, Usvsn Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar Van Der Wal. Pythia : A suite for analyzing large language models across training and scaling. In Andreas Krause, Emma Brunskill, Kyun...

work page 2023
[8]

(Leveled) Fully Homomorphic Encryption without Bootstrapping

Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) Fully Homomorphic Encryption without Bootstrapping . ACM Trans. Comput. Theory, 6 0 (3), July 2014. ISSN 1942-3454. doi:10.1145/2633600

work page doi:10.1145/2633600 2014
[9]

The-x: Privacy-preserving transformer inference with homomorphic encryption

Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, and Furu Wei. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216, 2022. URL https://arxiv.org/abs/2206.00216

work page arXiv 2022
[10]

Homomorphic encryption for arithmetic of approximate numbers

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. In International Conference on the Theory and Application of Cryptology and Information Security, pp.\ 409--437. Springer, 2017. doi:10.1007/978-3-319-70694-8\_15

work page doi:10.1007/978-3-319-70694-8 2017
[11]

P-nets: Deep polynomial neural networks

Grigorios G Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Yannis Panagakis, Jiankang Deng, and Stefanos Zafeiriou. P-nets: Deep polynomial neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7325--7335, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Chrysos_P-nets_Deep_Polynomial_...

work page 2020
[12]

East: Efficient and accurate secure transformer framework for inference

Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu Guan, and Xiyong Zhang. East: Efficient and accurate secure transformer framework for inference. arXiv preprint arXiv:2308.09923, 2023. URL https://arxiv.org/abs/2308.09923

work page arXiv 2023
[13]

Efficient skip connections realization for secure inference on encrypted data

Nir Drucker and Itamar Zimerman. Efficient skip connections realization for secure inference on encrypted data. In Shlomi Dolev, Ehud Gudes, and Pascal Paillier (eds.), Cyber Security, Cryptology, and Machine Learning, pp.\ 65--73, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-34671-2. doi:10.1007/978-3-031-34671-2_5

work page doi:10.1007/978-3-031-34671-2_5 2023
[14]

Somewhat Practical Fully Homomorphic Encryption

Junfeng Fan and Frederik Vercauteren. Somewhat Practical Fully Homomorphic Encryption . Proceedings of the 15th international conference on Practice and Theory in Public Key Cryptography, pp.\ 1--16, 2012. URL https://eprint.iacr.org/2012/144

work page 2012
[15]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020. URL https://arxiv.org/abs/2101.00027

work page internal anchor Pith review Pith/arXiv arXiv 2020
[16]

A fully homomorphic encryption scheme

Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, Palo Alto, CA, 2009. URL https://crypto.stanford.edu/craig/craig-thesis.pdf

work page 2009
[17]

Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International conference on machine learning, pp.\ 201--210. PMLR, 2016. URL http://proceedings.mlr.press/v48/gilad-bachrach16.pdf

work page 2016
[18]

Openwebtext corpus

Aaron Gokaslan and Vanya Cohen. Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus, 2019

work page 2019
[19]

Applications of division by convergence

Robert E Goldschmidt. Applications of division by convergence. PhD thesis, Massachusetts Institute of Technology, 1964. URL https://dspace.mit.edu/bitstream/handle/1721.1/11113/34136725-MIT.pdf

work page 1964
[20]

Polynomial activation functions

Vikas Gottemukkula. Polynomial activation functions. OpenReview, 2020. URL https://openreview.net/forum?id=rkxsgkHKvH

work page 2020
[21]

Bayesian neural networks uncertainty quantification with cubature rules

Mohit Goyal, Rajan Goyal, and Brejesh Lall. Improved polynomial neural networks with normalised activations. In 2020 International Joint Conference on Neural Networks (IJCNN), pp.\ 1--8. IEEE, 2020. doi:10.1109/IJCNN48605.2020.9207535

work page doi:10.1109/ijcnn48605.2020.9207535 2020
[22]

SIGMA : Secure GPT inference with function secret sharing

Kanav Gupta, Neha Jawalkar, Ananta Mukherjee, Nishanth Chandran, Divya Gupta, Ashish Panwar, and Rahul Sharma. SIGMA : Secure GPT inference with function secret sharing. Cryptology ePrint Archive, 2023. URL https://eprint.iacr.org/2023/1269

work page 2023
[23]

Neujeans: Private neural network inference with joint optimization of convolution and bootstrapping

Jae Hyung Ju, Jaiyoung Park, Jongmin Kim, Donghwan Kim, and Jung Ho Ahn. Neujeans: Private neural network inference with joint optimization of convolution and bootstrapping. arXiv preprint arXiv:2312.04356, 2023. URL https://arxiv.org/abs/2312.04356

work page arXiv 2023
[24]

Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions

Eunsang Lee, Joon-Woo Lee, Junghyun Lee, Young-Sik Kim, Yongjune Kim, Jong-Seon No, and Woosuk Choi. Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th Intern...

work page 2022
[25]

Precise approximation of convolutional neural networks for homomorphically encrypted data

Junghyun Lee, Eunsang Lee, Joon-Woo Lee, Yongjune Kim, Young-Sik Kim, and Jong-Seon No. Precise approximation of convolutional neural networks for homomorphically encrypted data. arXiv preprint arXiv:2105.10879, 2021. URL https://arxiv.org/abs/2105.10879

work page arXiv 2021
[26]

Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,

Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, and Jong-Seon No. Optimizing layerwise polynomial approximation for efficient private inference on fully homomorphic encryption: A dynamic programming approach. arXiv preprint arXiv:2310.10349, 2023. URL https://arxiv.org/abs/2310.10349

work page arXiv 2023
[27]

MERGE : Fast private text generation

Zi Liang, Pinghui Wang, Ruofei Zhang, Nuo Xu, Shuo Zhang, Lifeng Xing, Haitao Bai, and Ziyang Zhou. MERGE : Fast private text generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38 0 (18): 0 19884--19892, Mar. 2024. doi:10.1609/aaai.v38i18.29964

work page doi:10.1609/aaai.v38i18.29964 2024
[28]

Llms can understand encrypted prompt: Towards privacy-computing friendly transformers

Xuanqi Liu and Zhuotao Liu. LLMs can understand encrypted prompt: Towards privacy-computing friendly transformers. arXiv preprint arXiv:2305.18396, 2023. URL https://arxiv.org/abs/2305.18396

work page arXiv 2023
[29]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. URL https://arxiv.org/abs/1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 1907
[30]

Financial news classification dataset

Nicholas Muchinguri. Financial news classification dataset. https://huggingface.co/datasets/nickmuchi/financial-classification, 2022. Accessed: 2024-05-26

work page 2022
[31]

fairseq: A fast, extensible toolkit for sequence modeling

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations, 2019

work page 2019
[32]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017. URL https://arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Analyzing the structure of attention in a transformer language model

Jesse Vig and Yonatan Belinkov. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP , pp.\ 63--76, Florence, Italy, August 2019. Association for Computational Linguistics. doi:10.18653/v1/W19-4808. URL https://aclanthology.org/W19-4808

work page doi:10.18653/v1/w19-4808 2019
[34]

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[35]

On protecting the data privacy of large language models (llms): A survey,

Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzheng Cheng. On protecting the data privacy of large language models ( LLMs ): A survey. arXiv preprint arXiv:2403.05156, 2024. URL https://arxiv.org/abs/2403.05156

work page arXiv 2024
[36]

Energy -Aware Proof-of-Authority: Blockchain Consensus for Clustered Wireless Sensor Network

Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4 0 (2): 0 100211, 2024. ISSN 2667-2952. doi:https://doi.org/10.1016/j.hcc.2024.100211

work page doi:10.1016/j.hcc.2024.100211 2024
[37]

Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement

Chi Zhang, Man Ho Au, and Siu Ming Yiu. Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement. arXiv preprint arXiv:2402.11224, 2024 a . URL https://arxiv.org/abs/2402.11224

work page arXiv 2024
[38]

Secure transformer inference made non-interactive

Jiawen Zhang, Jian Liu, Xinpeng Yang, Yinghao Wang, Kejia Chen, Xiaoyang Hou, Kui Ren, and Xiaohu Yang. Secure transformer inference made non-interactive. Cryptology ePrint Archive, 2024 b . URL https://eprint.iacr.org/2024/136

work page 2024
[39]

Primer: Fast private transformer inference on encrypted data

Mengxin Zheng, Qian Lou, and Lei Jiang. Primer: Fast private transformer inference on encrypted data. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pp.\ 1--6, 2023. doi:10.1109/DAC56929.2023.10247719

work page doi:10.1109/dac56929.2023.10247719 2023
[40]

Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training

Jun Zhou, Huimin Qian, Xinbiao Lu, Zhaoxia Duan, Haoqian Huang, and Zhen Shao. Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training. Neurocomputing, 359: 0 227--240, 2019. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2019.06.004

work page doi:10.1016/j.neucom.2019.06.004 2019
[41]

Converting transformers to polynomial form for secure inference over homomorphic encryption

Itamar Zimerman, Moran Baruch, Nir Drucker, Gilad Ezov, Omri Soceanu, and Lior Wolf. Converting transformers to polynomial form for secure inference over homomorphic encryption. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (eds.), Proceedings of the 41st International Conferen...

work page 2024
[42]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page
[43]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page
[44]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page
[45]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page

[1] [1]

HElayers: A tile tensors framework for large neural networks on encrypted data

Ehud Aharoni, Allon Adir, Moran Baruch, Nir Drucker, Gilad Ezov, Ariel Farkash, Lev Greenberg, Ramy Masalha, Guy Moshkowich, Dov Murik, et al. HElayers: A tile tensors framework for large neural networks on encrypted data . PoPETs, 2023. doi:10.56553/popets-2023-0020

work page doi:10.56553/popets-2023-0020 2023

[2] [2]

On the privacy of protocols based on cpa-secure homomorphic encryption

Adi Akavia and Margarita Vald. On the privacy of protocols based on cpa-secure homomorphic encryption. IACR Cryptol. ePrint Arch. , 2021: 0 803, 2021. URL https://eprint.iacr.org/2021/803

work page 2021

[3] [3]

Gpt-neox: Large scale autoregressive language modeling in pytorch, 9 2023

Alex Andonian, Quentin Anthony, Stella Biderman, Sid Black, Preetham Gali, Leo Gao, Eric Hallahan, Josh Levy-Kramer, Connor Leahy, Lucas Nestler, Kip Parker, Michael Pieler, Jason Phang, Shivanshu Purohit, Hailey Schoelkopf, Dashiell Stander, Tri Songz, Curt Tigges, Benjamin Thérien, Phil Wang, and Samuel Weinbach. Gpt-neox: Large scale autoregressive lan...

work page 2023

[4] [4]

AutoFHE : Automated adaption of CNNs for efficient evaluation over FHE

Wei Ao and Vishnu Naresh Boddeti. AutoFHE : Automated adaption of CNNs for efficient evaluation over FHE . In 33rd USENIX Security Symposium (USENIX Security 24), pp.\ 2173--2190, Philadelphia, PA, August 2024. USENIX Association. ISBN 978-1-939133-44-1. URL https://www.usenix.org/conference/usenixsecurity24/presentation/ao

work page 2024

[5] [5]

A Methodology for Training Homomorphic Encryption Friendly Neural Networks

Moran Baruch, Nir Drucker, Lev Greenberg, and Guy Moshkowich. A Methodology for Training Homomorphic Encryption Friendly Neural Networks . In Applied Cryptography and Network Security Workshops, pp.\ 536--553, Cham, 2022. Springer International Publishing. ISBN 978-3-031-16815-4. doi:10.1007/978-3-031-16815-4\_29

work page doi:10.1007/978-3-031-16815-4 2022

[6] [6]

Sensitive Tuning of Large Scale CNNs for E2E Secure Prediction using Homomorphic Encryption

Moran Baruch, Nir Drucker, Gilad Ezov, Eyal Kushnir, Jenny Lerner, Omri Soceanu, and Itamar Zimerman. Sensitive Tuning of Large Scale CNNs for E2E Secure Prediction using Homomorphic Encryption . arXiv preprint arXiv:2304.14836, 2023. URL https://arxiv.org/pdf/2304.14836. To appear in CSCML 2024

work page arXiv 2023

[7] [7]

Pythia : A suite for analyzing large language models across training and scaling

Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O'Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, Usvsn Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar Van Der Wal. Pythia : A suite for analyzing large language models across training and scaling. In Andreas Krause, Emma Brunskill, Kyun...

work page 2023

[8] [8]

(Leveled) Fully Homomorphic Encryption without Bootstrapping

Zvika Brakerski, Craig Gentry, and Vinod Vaikuntanathan. (Leveled) Fully Homomorphic Encryption without Bootstrapping . ACM Trans. Comput. Theory, 6 0 (3), July 2014. ISSN 1942-3454. doi:10.1145/2633600

work page doi:10.1145/2633600 2014

[9] [9]

The-x: Privacy-preserving transformer inference with homomorphic encryption

Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, Jianxin Li, and Furu Wei. The-x: Privacy-preserving transformer inference with homomorphic encryption. arXiv preprint arXiv:2206.00216, 2022. URL https://arxiv.org/abs/2206.00216

work page arXiv 2022

[10] [10]

Homomorphic encryption for arithmetic of approximate numbers

Jung Hee Cheon, Andrey Kim, Miran Kim, and Yongsoo Song. Homomorphic encryption for arithmetic of approximate numbers. In International Conference on the Theory and Application of Cryptology and Information Security, pp.\ 409--437. Springer, 2017. doi:10.1007/978-3-319-70694-8\_15

work page doi:10.1007/978-3-319-70694-8 2017

[11] [11]

P-nets: Deep polynomial neural networks

Grigorios G Chrysos, Stylianos Moschoglou, Giorgos Bouritsas, Yannis Panagakis, Jiankang Deng, and Stefanos Zafeiriou. P-nets: Deep polynomial neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.\ 7325--7335, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Chrysos_P-nets_Deep_Polynomial_...

work page 2020

[12] [12]

East: Efficient and accurate secure transformer framework for inference

Yuanchao Ding, Hua Guo, Yewei Guan, Weixin Liu, Jiarong Huo, Zhenyu Guan, and Xiyong Zhang. East: Efficient and accurate secure transformer framework for inference. arXiv preprint arXiv:2308.09923, 2023. URL https://arxiv.org/abs/2308.09923

work page arXiv 2023

[13] [13]

Efficient skip connections realization for secure inference on encrypted data

Nir Drucker and Itamar Zimerman. Efficient skip connections realization for secure inference on encrypted data. In Shlomi Dolev, Ehud Gudes, and Pascal Paillier (eds.), Cyber Security, Cryptology, and Machine Learning, pp.\ 65--73, Cham, 2023. Springer Nature Switzerland. ISBN 978-3-031-34671-2. doi:10.1007/978-3-031-34671-2_5

work page doi:10.1007/978-3-031-34671-2_5 2023

[14] [14]

Somewhat Practical Fully Homomorphic Encryption

Junfeng Fan and Frederik Vercauteren. Somewhat Practical Fully Homomorphic Encryption . Proceedings of the 15th international conference on Practice and Theory in Public Key Cryptography, pp.\ 1--16, 2012. URL https://eprint.iacr.org/2012/144

work page 2012

[15] [15]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020. URL https://arxiv.org/abs/2101.00027

work page internal anchor Pith review Pith/arXiv arXiv 2020

[16] [16]

A fully homomorphic encryption scheme

Craig Gentry. A fully homomorphic encryption scheme. PhD thesis, Stanford University, Palo Alto, CA, 2009. URL https://crypto.stanford.edu/craig/craig-thesis.pdf

work page 2009

[17] [17]

Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy

Ran Gilad-Bachrach, Nathan Dowlin, Kim Laine, Kristin Lauter, Michael Naehrig, and John Wernsing. Cryptonets: Applying neural networks to encrypted data with high throughput and accuracy. In International conference on machine learning, pp.\ 201--210. PMLR, 2016. URL http://proceedings.mlr.press/v48/gilad-bachrach16.pdf

work page 2016

[18] [18]

Openwebtext corpus

Aaron Gokaslan and Vanya Cohen. Openwebtext corpus. http://Skylion007.github.io/OpenWebTextCorpus, 2019

work page 2019

[19] [19]

Applications of division by convergence

Robert E Goldschmidt. Applications of division by convergence. PhD thesis, Massachusetts Institute of Technology, 1964. URL https://dspace.mit.edu/bitstream/handle/1721.1/11113/34136725-MIT.pdf

work page 1964

[20] [20]

Polynomial activation functions

Vikas Gottemukkula. Polynomial activation functions. OpenReview, 2020. URL https://openreview.net/forum?id=rkxsgkHKvH

work page 2020

[21] [21]

Bayesian neural networks uncertainty quantification with cubature rules

Mohit Goyal, Rajan Goyal, and Brejesh Lall. Improved polynomial neural networks with normalised activations. In 2020 International Joint Conference on Neural Networks (IJCNN), pp.\ 1--8. IEEE, 2020. doi:10.1109/IJCNN48605.2020.9207535

work page doi:10.1109/ijcnn48605.2020.9207535 2020

[22] [22]

SIGMA : Secure GPT inference with function secret sharing

Kanav Gupta, Neha Jawalkar, Ananta Mukherjee, Nishanth Chandran, Divya Gupta, Ashish Panwar, and Rahul Sharma. SIGMA : Secure GPT inference with function secret sharing. Cryptology ePrint Archive, 2023. URL https://eprint.iacr.org/2023/1269

work page 2023

[23] [23]

Neujeans: Private neural network inference with joint optimization of convolution and bootstrapping

Jae Hyung Ju, Jaiyoung Park, Jongmin Kim, Donghwan Kim, and Jung Ho Ahn. Neujeans: Private neural network inference with joint optimization of convolution and bootstrapping. arXiv preprint arXiv:2312.04356, 2023. URL https://arxiv.org/abs/2312.04356

work page arXiv 2023

[24] [24]

Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions

Eunsang Lee, Joon-Woo Lee, Junghyun Lee, Young-Sik Kim, Yongjune Kim, Jong-Seon No, and Woosuk Choi. Low-complexity deep convolutional neural networks on fully homomorphic encryption using multiplexed parallel convolutions. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesvari, Gang Niu, and Sivan Sabato (eds.), Proceedings of the 39th Intern...

work page 2022

[25] [25]

Precise approximation of convolutional neural networks for homomorphically encrypted data

Junghyun Lee, Eunsang Lee, Joon-Woo Lee, Yongjune Kim, Young-Sik Kim, and Jong-Seon No. Precise approximation of convolutional neural networks for homomorphically encrypted data. arXiv preprint arXiv:2105.10879, 2021. URL https://arxiv.org/abs/2105.10879

work page arXiv 2021

[26] [26]

Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,

Junghyun Lee, Eunsang Lee, Young-Sik Kim, Yongwoo Lee, Joon-Woo Lee, Yongjune Kim, and Jong-Seon No. Optimizing layerwise polynomial approximation for efficient private inference on fully homomorphic encryption: A dynamic programming approach. arXiv preprint arXiv:2310.10349, 2023. URL https://arxiv.org/abs/2310.10349

work page arXiv 2023

[27] [27]

MERGE : Fast private text generation

Zi Liang, Pinghui Wang, Ruofei Zhang, Nuo Xu, Shuo Zhang, Lifeng Xing, Haitao Bai, and Ziyang Zhou. MERGE : Fast private text generation. Proceedings of the AAAI Conference on Artificial Intelligence, 38 0 (18): 0 19884--19892, Mar. 2024. doi:10.1609/aaai.v38i18.29964

work page doi:10.1609/aaai.v38i18.29964 2024

[28] [28]

Llms can understand encrypted prompt: Towards privacy-computing friendly transformers

Xuanqi Liu and Zhuotao Liu. LLMs can understand encrypted prompt: Towards privacy-computing friendly transformers. arXiv preprint arXiv:2305.18396, 2023. URL https://arxiv.org/abs/2305.18396

work page arXiv 2023

[29] [29]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Yinhan Liu. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019. URL https://arxiv.org/abs/1907.11692

work page internal anchor Pith review Pith/arXiv arXiv 1907

[30] [30]

Financial news classification dataset

Nicholas Muchinguri. Financial news classification dataset. https://huggingface.co/datasets/nickmuchi/financial-classification, 2022. Accessed: 2024-05-26

work page 2022

[31] [31]

fairseq: A fast, extensible toolkit for sequence modeling

Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. In Proceedings of NAACL-HLT 2019: Demonstrations, 2019

work page 2019

[32] [32]

Attention Is All You Need

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2017. URL https://arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv 2017

[33] [33]

Analyzing the structure of attention in a transformer language model

Jesse Vig and Yonatan Belinkov. Analyzing the structure of attention in a transformer language model. In Proceedings of the 2019 ACL Workshop BlackboxNLP : Analyzing and Interpreting Neural Networks for NLP , pp.\ 63--76, Florence, Italy, August 2019. Association for Computational Linguistics. doi:10.18653/v1/W19-4808. URL https://aclanthology.org/W19-4808

work page doi:10.18653/v1/w19-4808 2019

[34] [34]

GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding

Alex Wang. Glue: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[35] [35]

On protecting the data privacy of large language models (llms): A survey,

Biwei Yan, Kun Li, Minghui Xu, Yueyan Dong, Yue Zhang, Zhaochun Ren, and Xiuzheng Cheng. On protecting the data privacy of large language models ( LLMs ): A survey. arXiv preprint arXiv:2403.05156, 2024. URL https://arxiv.org/abs/2403.05156

work page arXiv 2024

[36] [36]

Energy -Aware Proof-of-Authority: Blockchain Consensus for Clustered Wireless Sensor Network

Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 4 0 (2): 0 100211, 2024. ISSN 2667-2952. doi:https://doi.org/10.1016/j.hcc.2024.100211

work page doi:10.1016/j.hcc.2024.100211 2024

[37] [37]

Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement

Chi Zhang, Man Ho Au, and Siu Ming Yiu. Neural networks with (low-precision) polynomial approximations: New insights and techniques for accuracy improvement. arXiv preprint arXiv:2402.11224, 2024 a . URL https://arxiv.org/abs/2402.11224

work page arXiv 2024

[38] [38]

Secure transformer inference made non-interactive

Jiawen Zhang, Jian Liu, Xinpeng Yang, Yinghao Wang, Kejia Chen, Xiaoyang Hou, Kui Ren, and Xiaohu Yang. Secure transformer inference made non-interactive. Cryptology ePrint Archive, 2024 b . URL https://eprint.iacr.org/2024/136

work page 2024

[39] [39]

Primer: Fast private transformer inference on encrypted data

Mengxin Zheng, Qian Lou, and Lei Jiang. Primer: Fast private transformer inference on encrypted data. In 2023 60th ACM/IEEE Design Automation Conference (DAC), pp.\ 1--6, 2023. doi:10.1109/DAC56929.2023.10247719

work page doi:10.1109/dac56929.2023.10247719 2023

[40] [40]

Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training

Jun Zhou, Huimin Qian, Xinbiao Lu, Zhaoxia Duan, Haoqian Huang, and Zhen Shao. Polynomial activation neural networks: Modeling, stability analysis and coverage bp-training. Neurocomputing, 359: 0 227--240, 2019. ISSN 0925-2312. doi:https://doi.org/10.1016/j.neucom.2019.06.004

work page doi:10.1016/j.neucom.2019.06.004 2019

[41] [41]

Converting transformers to polynomial form for secure inference over homomorphic encryption

Itamar Zimerman, Moran Baruch, Nir Drucker, Gilad Ezov, Omri Soceanu, and Lior Wolf. Converting transformers to polynomial form for secure inference over homomorphic encryption. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp (eds.), Proceedings of the 41st International Conferen...

work page 2024

[42] [42]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

work page

[43] [43]

@esa (Ref

\@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

work page

[44] [44]

\@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

work page

[45] [45]

@open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...

work page