CGF-Softmax: A Cumulant-Based Softmax Reformulation for Efficient Inference under Homomorphic Encryption
Pith reviewed 2026-05-16 08:48 UTC · model grok-4.3
The pith
CGF-softmax reformulates the softmax denominator via the cumulant generating function to reduce multiplicative depth for homomorphic encryption inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CGF-softmax reformulates the softmax denominator through the cumulant generating function, eliminating both homomorphic division and explicit maximum subtraction. This substantially reduces multiplicative depth while preserving key properties of softmax, leading to efficient and accurate approximation in encrypted inference on Vision Transformers and large language models.
What carries the argument
The cumulant generating function reformulation of the softmax denominator, which replaces the sum of exponentials with an alternative expression that avoids encrypted division.
Load-bearing premise
The cumulant generating function reformulation preserves the essential normalization and probability properties of softmax with sufficient accuracy for the data distributions arising in Vision Transformers and large language models.
What would settle it
An encrypted inference run on a standard Vision Transformer where the CGF-softmax version produces top-1 accuracy more than a few percent lower than the unencrypted baseline or a high-depth exact softmax.
read the original abstract
Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of transformer architectures, remains particularly challenging in HE due to its multivariate structure, the large dynamic range induced by exponential functions, and the costly division operation. In this paper, we propose CGF-softmax, which reformulates the softmax denominator through the cumulant generating function (CGF). By eliminating both homomorphic division and explicit maximum subtraction, this reformulation substantially reduces multiplicative depth while preserving key properties of softmax. Extensive experiments on Vision Transformers and large language models show that CGF-softmax provides an efficient and accurate approximation of softmax in encrypted inference. In particular, it achieves inference accuracy close to that of high-depth exact methods, while requiring substantially lower computational cost through reduced multiplicative depth.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CGF-Softmax, a reformulation of the softmax denominator via the cumulant generating function (CGF) that eliminates homomorphic division and explicit max subtraction. This reduces multiplicative depth in encrypted inference while aiming to preserve normalization and probability properties of standard softmax. Experiments on Vision Transformers and large language models report inference accuracy close to high-depth exact methods at substantially lower computational cost.
Significance. If the approximation accuracy holds with quantifiable error bounds, the work would meaningfully advance practical HE inference for transformers by lowering the multiplicative depth of a core non-linear operation. The parameter-free, direct mathematical rewriting (no fitted parameters or self-referential definitions) is a methodological strength that distinguishes it from many prior approximations.
major comments (2)
- [Abstract and Experiments] Abstract and Experiments: The central claim that CGF-softmax achieves 'accuracy close to that of high-depth exact methods' is unsupported by quantitative error bounds, maximum deviation metrics, or KL-divergence measurements between the approximated and exact softmax outputs. Without these, it is impossible to assess whether the approximation preserves essential properties with sufficient accuracy for ViT and LLM data distributions.
- [Method (CGF reformulation)] Reformulation and validation: No ablation on the CGF approximation order (e.g., truncation level) is provided, nor is there explicit verification or bound showing how the reformulation maintains the sum-to-one normalization property after eliminating division. This directly affects the load-bearing assumption that key softmax properties are preserved at the reported depths.
minor comments (1)
- [Abstract] Abstract: The phrase 'high-depth exact methods' should be accompanied by citations to the specific prior HE softmax implementations being compared.
Simulated Author's Rebuttal
We thank the referee for the constructive comments and for recognizing the methodological contribution of the parameter-free CGF reformulation. We address each major point below and will revise the manuscript to incorporate the requested quantitative support and validation.
read point-by-point responses
-
Referee: [Abstract and Experiments] Abstract and Experiments: The central claim that CGF-softmax achieves 'accuracy close to that of high-depth exact methods' is unsupported by quantitative error bounds, maximum deviation metrics, or KL-divergence measurements between the approximated and exact softmax outputs. Without these, it is impossible to assess whether the approximation preserves essential properties with sufficient accuracy for ViT and LLM data distributions.
Authors: We agree that explicit quantitative error metrics are needed to substantiate the accuracy claims. In the revised manuscript we will add, in both the abstract and the experiments section, maximum absolute deviation, mean KL-divergence, and per-layer error bounds between CGF-Softmax and exact softmax, computed on the same ViT and LLM evaluation sets. These metrics will be reported alongside the existing accuracy figures to allow direct assessment of approximation quality. revision: yes
-
Referee: [Method (CGF reformulation)] Reformulation and validation: No ablation on the CGF approximation order (e.g., truncation level) is provided, nor is there explicit verification or bound showing how the reformulation maintains the sum-to-one normalization property after eliminating division. This directly affects the load-bearing assumption that key softmax properties are preserved at the reported depths.
Authors: We will add an ablation study varying the CGF truncation order (from order 2 to 6) and report its impact on multiplicative depth, inference accuracy, and runtime. For the normalization property, the CGF reformulation is derived from the exact cumulant-generating-function identity of the log-sum-exp; truncation error is bounded by the remainder term of the Taylor series. We will include a short proof sketch showing that the approximated outputs sum to 1 within the truncation error plus HE noise, together with empirical verification that the sum deviates by less than 10^{-4} on the evaluated datasets. These additions will appear in the revised Method and Experiments sections. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper presents CGF-softmax as a direct mathematical reformulation of the softmax denominator via cumulant generating function properties, eliminating homomorphic division and max subtraction without relying on fitted parameters, self-referential predictions, or load-bearing self-citations. The central claim reduces to standard CGF definitions applied to the exponential sum, which is an independent rewriting rather than a tautology or fit renamed as prediction. No equation in the provided abstract or description equates the output to its inputs by construction, and the experimental validation on ViTs/LLMs serves as external check rather than internal forcing. This is a self-contained algebraic step with no reduction to prior author work as the sole justification.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The cumulant generating function can be truncated or approximated to replace the softmax normalization term while preserving key monotonicity and positivity properties.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.