Power-Softmax is a new HE-compatible attention variant that permits training and inference of billion-parameter polynomial LLMs with performance matching standard transformers.
The-x: Privacy-preserving transformer inference with homomorphic encryption
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
unclear 1representative citing papers
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
System-level evaluation reveals that network constraints and hardware costs, rather than raw latency, often dictate the optimal choice between MPC and FHE for privacy-preserving ML.
ConfusionPrompt enables private black-box LLM inference via prompt decomposition and pseudo-prompt mixing, claiming better privacy-utility trade-off than perturbation methods and lower memory use than open-source local models.
citing papers explorer
-
Power-Softmax: Towards Secure LLM Inference over Encrypted Data
Power-Softmax is a new HE-compatible attention variant that permits training and inference of billion-parameter polynomial LLMs with performance matching standard transformers.
-
On the (In-)Security of the Shuffling Defense in the Transformer Secure Inference
An attack aligns differently shuffled intermediate activations from secure Transformer inference queries to recover model weights with low error using roughly one dollar of queries.
-
Beyond Latency: A System-Level Characterization of MPC and FHE for PPML
System-level evaluation reveals that network constraints and hardware costs, rather than raw latency, often dictate the optimal choice between MPC and FHE for privacy-preserving ML.
-
ConfusionPrompt: Practical Private Inference for Online Large Language Models
ConfusionPrompt enables private black-box LLM inference via prompt decomposition and pseudo-prompt mixing, claiming better privacy-utility trade-off than perturbation methods and lower memory use than open-source local models.