Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference
Pith reviewed 2026-05-25 06:00 UTC · model grok-4.3
The pith
Quadratic polynomials replace ReLU while preserving decisions on a calibration set for homomorphic encryption inference.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For calibration sets positive-margin separable in the lifted space, quadratic replacement reduces to a linear separation problem that supplies both necessary and sufficient conditions for calibration-lossless replacement and a constructive algorithm for the coefficients. When the positive-margin condition fails because a few near-boundary samples bring the lifted hulls into contact, reduced convex hulls and Lagrangian-dual soft-margin relaxations cap the weight any single sample can carry and convert the task into smaller convex quadratic programs that produce coefficients with high empirical agreement on calibration-set decisions. At the maximal weight cap the relaxation recovers standard凸体
What carries the argument
Lifted-space linear separation that formulates quadratic ReLU replacement as a convex separation task, extended continuously by reduced-convex-hull relaxations when exact separation fails.
If this is right
- Exact decision preservation holds if and only if the lifted calibration points admit positive-margin separation.
- Reduced-convex-hull relaxations produce usable coefficients even when a few samples cause the hulls to touch.
- Under CKKS the quadratic activations run 3.7-4.1 times faster than degree-7 Remez polynomials in the activation module.
- End-to-end inference is 1.18-1.68 times faster than the higher-degree baseline while matching plaintext top-1 accuracy.
- No retraining is required; only the calibration set is used to compute the replacement coefficients.
Where Pith is reading between the lines
- The separation view could be applied layer by layer in deeper networks if calibration sets are constructed per layer.
- Larger or more diverse calibration sets would raise the probability that the exact positive-margin condition holds.
- Measuring decision agreement on test data versus calibration data would quantify how well the finite-set assumption generalizes.
Load-bearing premise
That matching the original decisions exactly on a finite calibration set is enough to make the quadratic replacement useful for new inputs.
What would settle it
If the replaced network produces different classification decisions than the original ReLU network on a held-out test set drawn from the same distribution, the replacement has failed to preserve behavior.
Figures
read the original abstract
Fully homomorphic encryption (FHE) supports only additions and multiplications, so FHE-only neural-network inference typically replaces ReLU with polynomials fitted over empirical activation intervals. Such interval fitting often requires higher-degree polynomials to control activation error, incurring homomorphic evaluation costs, while classification is determined by the final logit decision. We revisit ReLU replacement from a decision-aware perspective: given a trained single-hidden-layer ReLU MLP and a specified calibration set, can an HE-friendly low-degree polynomial replace ReLU without retraining while preserving calibration-set decisions? We focus on quadratic replacement, the lowest-degree that retains a genuine per-unit nonlinearity. For calibration sets positive-margin separable in the lifted space, we formulate quadratic replacement as a linear separation problem, yielding necessary and sufficient conditions for calibration-lossless replacement and a constructive algorithm for the coefficients. When the positive-margin condition fails -- often because a few near-boundary or misclassified calibration samples bring the lifted hulls into contact -- we extend the same geometric framework via reduced convex hulls and Lagrangian-dual soft-margin relaxations. These cap the weight any single sample can carry, converting the problem into smaller convex quadratic programs that yield approximately feasible coefficients with high empirical agreement on calibration-set decisions. In particular, at the maximal weight cap $\mu=1$, the reduced-convex-hull relaxation reduces to standard convex-hull separation; the relaxation thus continuously extends the positive-margin exact theory. Under CKKS, the quadratic replacement matches plaintext top-1 accuracy on multiple benchmarks, running 3.7--4.1$\times$ faster than Remez-7 in the activation module and 1.18--1.68$\times$ faster end-to-end.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a decision-aware approach to replacing ReLU activations with quadratic polynomials in single-hidden-layer MLPs for fully homomorphic encryption (FHE) inference. Given a trained model and calibration set, it formulates the problem as finding quadratic coefficients that preserve the decisions on the calibration set. For cases where the lifted-space calibration points are positive-margin separable, this is cast as a linear separation problem with necessary and sufficient conditions and a constructive algorithm. When separability fails, reduced convex hulls and soft-margin relaxations with a weight cap μ are employed to find approximate coefficients. Experiments under the CKKS scheme show that the resulting quadratics achieve the same top-1 accuracy as the plaintext model on benchmarks while providing speedups compared to higher-degree polynomial replacements.
Significance. The geometric formulation in the lifted space offers a clean, constructive method for calibration-lossless replacement that is independent of empirical fitting constants from prior work. The extension via reduced convex hulls provides a continuous family of relaxations controlled by μ. The reported 3.7--4.1× speedup in the activation module and 1.18--1.68× end-to-end are notable. The method's focus on decision preservation rather than uniform approximation is a useful perspective. The stress-test concern regarding generalization from calibration to test sets does not land, as the manuscript reports matching accuracy on the benchmarks.
minor comments (3)
- [Abstract] Abstract: the specific benchmarks, models, and calibration-set sizes used for the accuracy and timing claims are not named; §4 should list them explicitly with dataset references.
- [§3] The definition of the lifted feature map and the resulting quadratic form should be given as numbered equations in §3 before the separation formulation is stated.
- Figure captions for the geometric illustrations should explicitly reference the value of μ used in each panel.
Simulated Author's Rebuttal
We thank the referee for the positive and constructive review, which accurately captures the geometric formulation, the role of reduced convex hulls, and the reported speedups under CKKS. The recommendation of minor revision is noted; we will incorporate any editorial suggestions in the revised version.
Circularity Check
No significant circularity in geometric separation formulation
full rationale
The paper derives quadratic ReLU replacement by lifting activations to a space where decision preservation on a calibration set reduces to linear separability (or its convex-hull/soft-margin relaxations). This construction directly yields necessary-and-sufficient conditions and coefficients from the separation problem itself; no parameter is fitted on one subset and then renamed as a prediction on a related quantity, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The weight cap μ is an explicit design parameter, not a hidden dependency. Empirical matching on benchmarks is presented as validation, not as part of the derivation chain. The method is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- mu
axioms (2)
- domain assumption The network is a single-hidden-layer ReLU MLP
- domain assumption Preserving decisions on the calibration set is the operative correctness criterion
Reference graph
Works this paper leans on
-
[1]
On data banks and privacy homomorphisms,
R. L. Rivest, L. Adleman, and M. L. Dertouzos, “On data banks and privacy homomorphisms,” inFoundations of Secure Computation, 1978, pp. 169–179
work page 1978
-
[2]
A fully homomorphic encryption scheme,
C. Gentry, “A fully homomorphic encryption scheme,” Ph.D. dissertation, Stanford University, 2009
work page 2009
-
[3]
Homomorphic encryption for arithmetic of approximate numbers,
J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inProc. ASIACRYPT, 2017, pp. 409–437
work page 2017
-
[4]
Machine learning classification over encrypted data,
R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, “Machine learning classification over encrypted data,” inProc. NDSS, 2015
work page 2015
-
[5]
SecureML: A system for scalable privacy-preserving machine learning,
P. Mohassel and Y . Zhang, “SecureML: A system for scalable privacy-preserving machine learning,” inProc. IEEE S&P, 2017, pp. 19–38
work page 2017
-
[6]
Oblivious neural network predictions via MiniONN transformations,
J. Liu, M. Juuti, Y . Lu, and N. Asokan, “Oblivious neural network predictions via MiniONN transformations,” inProc. ACM CCS, 2017, pp. 619–631
work page 2017
-
[7]
CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy,
R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy,” inProc. ICML, PMLR, vol. 48, pp. 201–210, 2016
work page 2016
-
[8]
Low latency privacy preserving inference,
A. Brutzkus, R. Gilad-Bachrach, and O. Elisha, “Low latency privacy preserving inference,” inProc. ICML, 2019, pp. 812–821
work page 2019
-
[9]
CHET: an optimizing compiler for fully-homomorphic neural- network inferencing,
R. Dathathriet al., “CHET: an optimizing compiler for fully-homomorphic neural- network inferencing,” inProc. ACM PLDI, 2019, pp. 142–156
work page 2019
-
[10]
EV A: An encrypted vector arithmetic language and compiler for efficient homomorphic computation,
R. Dathathriet al., “EV A: An encrypted vector arithmetic language and compiler for efficient homomorphic computation,” inProc. ACM PLDI, 2020, pp. 546–561
work page 2020
-
[11]
nGraph-HE2: A high-throughput framework for neural network inference on encrypted data,
F. Boemer, A. Costache, R. Cammarota, and C. Wierzynski, “nGraph-HE2: A high-throughput framework for neural network inference on encrypted data,” in Proc. WAHC, 2019, pp. 45–56
work page 2019
-
[12]
GAZELLE: A low latency framework for secure neural network inference,
C. Juvekar, V . Vaikuntanathan, and A. Chandrakasan, “GAZELLE: A low latency framework for secure neural network inference,” inProc. USENIX Security, 2018, pp. 1651–1669
work page 2018
-
[13]
DELPHI: A cryptographic inference service for neural networks,
P. Mishraet al., “DELPHI: A cryptographic inference service for neural networks,” inProc. USENIX Security, 2020, pp. 2505–2522
work page 2020
-
[14]
CrypTFlow2: Practical 2-party secure inference,
D. Ratheeet al., “CrypTFlow2: Practical 2-party secure inference,” inProc. ACM CCS, 2020, pp. 325–342
work page 2020
-
[15]
XONN: XNOR-based oblivious deep neural network inference,
M. S. Riaziet al., “XONN: XNOR-based oblivious deep neural network inference,” inProc. USENIX Security, 2019, pp. 1501–1518
work page 2019
-
[16]
Cheetah: Lean and fast secure two-party deep neural network inference,
Z. Huanget al., “Cheetah: Lean and fast secure two-party deep neural network inference,” inProc. USENIX Security, 2022, pp. 809–826
work page 2022
-
[17]
L. N. Trefethen,Approximation Theory and Approximation Practice. SIAM, 2013
work page 2013
-
[18]
S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge Univ. Press, 2004
work page 2004
-
[19]
CryptoDL: Deep Neural Networks over Encrypted Data
E. Hesamifard, H. Takabi, and M. Ghasemi, “CryptoDL: Deep neural networks over encrypted data,” arXiv:1711.05189, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[20]
Precise approximation of convolutional neural networks for homomorphically encrypted data,
J. Lee, E. Lee, J.-W. Lee, Y . Kim, Y .-S. Kim, and J.-S. No, “Precise approximation of convolutional neural networks for homomorphically encrypted data,”IEEE Access, vol. 11, pp. 62062–62076, 2023, doi: 10.1109/ACCESS.2023.3287564
-
[21]
Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,
J. Lee, E. Lee, Y .-S. Kim, Y . Lee, J.-W. Lee, Y . Kim, and J.-S. No, “Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,” arXiv:2310.10349v4, 2025
-
[22]
SAFENet: A secure, accurate, and fast neural network inference,
Q. Lou, Y . Shen, H. Jin, and L. Jiang, “SAFENet: A secure, accurate, and fast neural network inference,” inProc. ICLR, 2021
work page 2021
-
[23]
AutoFHE: Automated adaption of CNNs for efficient evaluation over FHE,
W. Ao and V . N. Boddeti, “AutoFHE: Automated adaption of CNNs for efficient evaluation over FHE,” inProc. USENIX Security, 2024, pp. 2173–2190
work page 2024
-
[24]
Explaining and harnessing adversarial examples,
I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. ICLR, 2015
work page 2015
-
[25]
C. Cortes and V . Vapnik, “Support-vector networks,”Machine Learning, vol. 20, no. 3, pp. 273–297, 1995
work page 1995
-
[26]
Duality and geometry in SVM classifiers,
K. P. Bennett and E. J. Bredensteiner, “Duality and geometry in SVM classifiers,” inProc. ICML, 2000, pp. 57–64
work page 2000
-
[27]
MLP-Mixer: An all-MLP architecture for vision,
I. O. Tolstikhinet al., “MLP-Mixer: An all-MLP architecture for vision,” inProc. NeurIPS, 2021
work page 2021
-
[28]
ResMLP: Feedforward networks for image classification with data-efficient training,
H. Touvronet al., “ResMLP: Feedforward networks for image classification with data-efficient training,”IEEE TPAMI, vol. 45, no. 4, pp. 5314–5321, 2023
work page 2023
-
[29]
Graph-less neural networks: Teaching old MLPs new tricks via distillation,
S. Zhang, Y . Liu, Y . Sun, and N. Shah, “Graph-less neural networks: Teaching old MLPs new tricks via distillation,” inProc. ICLR, 2022
work page 2022
-
[30]
DINOv2: Learning robust visual features without supervision,
M. Oquabet al., “DINOv2: Learning robust visual features without supervision,” Trans. Mach. Learn. Res. (TMLR), 2024
work page 2024
-
[31]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Y . Zhang, M. Li, D. Long, X. Zhang, H. Lin, B. Yang, P. Xie, A. Yang, D. Liu, J. Lin, F. Huang, and J. Zhou, “Qwen3 Embedding: Advancing text embedding and reranking through foundation models,” arXiv:2506.05176v3, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.