pith. sign in

arxiv: 2605.22237 · v2 · pith:QBDIFR73new · submitted 2026-05-21 · 💻 cs.CR · cs.LG

Decision-Aware Quadratic ReLU Replacement for HE-Friendly Inference

Pith reviewed 2026-05-25 06:00 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords homomorphic encryptionReLU replacementquadratic polynomialdecision preservationneural network inferenceCKKSconvex hull relaxation
0
0 comments X

The pith

Quadratic polynomials replace ReLU while preserving decisions on a calibration set for homomorphic encryption inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a decision-aware method to swap ReLU activations for quadratic polynomials in single-hidden-layer neural networks without retraining. When a calibration set is positive-margin separable after lifting to quadratic features, the coefficients are found by solving a linear separation problem that gives necessary and sufficient conditions for exact decision preservation. When boundary samples prevent separation, reduced convex hulls and soft-margin relaxations produce coefficients that still agree with the original decisions on most calibration points. The approach is evaluated under the CKKS encryption scheme and yields faster inference that matches the original top-1 accuracy.

Core claim

For calibration sets positive-margin separable in the lifted space, quadratic replacement reduces to a linear separation problem that supplies both necessary and sufficient conditions for calibration-lossless replacement and a constructive algorithm for the coefficients. When the positive-margin condition fails because a few near-boundary samples bring the lifted hulls into contact, reduced convex hulls and Lagrangian-dual soft-margin relaxations cap the weight any single sample can carry and convert the task into smaller convex quadratic programs that produce coefficients with high empirical agreement on calibration-set decisions. At the maximal weight cap the relaxation recovers standard凸体

What carries the argument

Lifted-space linear separation that formulates quadratic ReLU replacement as a convex separation task, extended continuously by reduced-convex-hull relaxations when exact separation fails.

If this is right

  • Exact decision preservation holds if and only if the lifted calibration points admit positive-margin separation.
  • Reduced-convex-hull relaxations produce usable coefficients even when a few samples cause the hulls to touch.
  • Under CKKS the quadratic activations run 3.7-4.1 times faster than degree-7 Remez polynomials in the activation module.
  • End-to-end inference is 1.18-1.68 times faster than the higher-degree baseline while matching plaintext top-1 accuracy.
  • No retraining is required; only the calibration set is used to compute the replacement coefficients.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation view could be applied layer by layer in deeper networks if calibration sets are constructed per layer.
  • Larger or more diverse calibration sets would raise the probability that the exact positive-margin condition holds.
  • Measuring decision agreement on test data versus calibration data would quantify how well the finite-set assumption generalizes.

Load-bearing premise

That matching the original decisions exactly on a finite calibration set is enough to make the quadratic replacement useful for new inputs.

What would settle it

If the replaced network produces different classification decisions than the original ReLU network on a held-out test set drawn from the same distribution, the replacement has failed to preserve behavior.

Figures

Figures reproduced from arXiv: 2605.22237 by Rui Li, Weijie Miao, Wenyuan Wu.

Figure 1
Figure 1. Figure 1: Binary post-training quadratic replacement as positive-margin hyper [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: HE top-1 accuracy versus the CKKS scaling factor [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
read the original abstract

Fully homomorphic encryption (FHE) supports only additions and multiplications, so FHE-only neural-network inference typically replaces ReLU with polynomials fitted over empirical activation intervals. Such interval fitting often requires higher-degree polynomials to control activation error, incurring homomorphic evaluation costs, while classification is determined by the final logit decision. We revisit ReLU replacement from a decision-aware perspective: given a trained single-hidden-layer ReLU MLP and a specified calibration set, can an HE-friendly low-degree polynomial replace ReLU without retraining while preserving calibration-set decisions? We focus on quadratic replacement, the lowest-degree that retains a genuine per-unit nonlinearity. For calibration sets positive-margin separable in the lifted space, we formulate quadratic replacement as a linear separation problem, yielding necessary and sufficient conditions for calibration-lossless replacement and a constructive algorithm for the coefficients. When the positive-margin condition fails -- often because a few near-boundary or misclassified calibration samples bring the lifted hulls into contact -- we extend the same geometric framework via reduced convex hulls and Lagrangian-dual soft-margin relaxations. These cap the weight any single sample can carry, converting the problem into smaller convex quadratic programs that yield approximately feasible coefficients with high empirical agreement on calibration-set decisions. In particular, at the maximal weight cap $\mu=1$, the reduced-convex-hull relaxation reduces to standard convex-hull separation; the relaxation thus continuously extends the positive-margin exact theory. Under CKKS, the quadratic replacement matches plaintext top-1 accuracy on multiple benchmarks, running 3.7--4.1$\times$ faster than Remez-7 in the activation module and 1.18--1.68$\times$ faster end-to-end.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript presents a decision-aware approach to replacing ReLU activations with quadratic polynomials in single-hidden-layer MLPs for fully homomorphic encryption (FHE) inference. Given a trained model and calibration set, it formulates the problem as finding quadratic coefficients that preserve the decisions on the calibration set. For cases where the lifted-space calibration points are positive-margin separable, this is cast as a linear separation problem with necessary and sufficient conditions and a constructive algorithm. When separability fails, reduced convex hulls and soft-margin relaxations with a weight cap μ are employed to find approximate coefficients. Experiments under the CKKS scheme show that the resulting quadratics achieve the same top-1 accuracy as the plaintext model on benchmarks while providing speedups compared to higher-degree polynomial replacements.

Significance. The geometric formulation in the lifted space offers a clean, constructive method for calibration-lossless replacement that is independent of empirical fitting constants from prior work. The extension via reduced convex hulls provides a continuous family of relaxations controlled by μ. The reported 3.7--4.1× speedup in the activation module and 1.18--1.68× end-to-end are notable. The method's focus on decision preservation rather than uniform approximation is a useful perspective. The stress-test concern regarding generalization from calibration to test sets does not land, as the manuscript reports matching accuracy on the benchmarks.

minor comments (3)
  1. [Abstract] Abstract: the specific benchmarks, models, and calibration-set sizes used for the accuracy and timing claims are not named; §4 should list them explicitly with dataset references.
  2. [§3] The definition of the lifted feature map and the resulting quadratic form should be given as numbered equations in §3 before the separation formulation is stated.
  3. Figure captions for the geometric illustrations should explicitly reference the value of μ used in each panel.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive and constructive review, which accurately captures the geometric formulation, the role of reduced convex hulls, and the reported speedups under CKKS. The recommendation of minor revision is noted; we will incorporate any editorial suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity in geometric separation formulation

full rationale

The paper derives quadratic ReLU replacement by lifting activations to a space where decision preservation on a calibration set reduces to linear separability (or its convex-hull/soft-margin relaxations). This construction directly yields necessary-and-sufficient conditions and coefficients from the separation problem itself; no parameter is fitted on one subset and then renamed as a prediction on a related quantity, no self-citation supplies a load-bearing uniqueness theorem, and no ansatz is smuggled in. The weight cap μ is an explicit design parameter, not a hidden dependency. Empirical matching on benchmarks is presented as validation, not as part of the derivation chain. The method is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The claim rests on the geometric properties of the lifted activation space and on the modeling choice that decision preservation on a calibration set is the relevant success metric; no new physical entities are introduced.

free parameters (1)
  • mu
    Weight cap on individual calibration samples in the reduced-convex-hull relaxation; set to 1 in the limiting case that recovers standard convex-hull separation.
axioms (2)
  • domain assumption The network is a single-hidden-layer ReLU MLP
    The lifting and separation arguments are developed specifically for this architecture.
  • domain assumption Preserving decisions on the calibration set is the operative correctness criterion
    The entire replacement procedure is defined with respect to this finite set rather than pointwise approximation error.

pith-pipeline@v0.9.0 · 5841 in / 1506 out tokens · 40953 ms · 2026-05-25T06:00:41.221397+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    On data banks and privacy homomorphisms,

    R. L. Rivest, L. Adleman, and M. L. Dertouzos, “On data banks and privacy homomorphisms,” inFoundations of Secure Computation, 1978, pp. 169–179

  2. [2]

    A fully homomorphic encryption scheme,

    C. Gentry, “A fully homomorphic encryption scheme,” Ph.D. dissertation, Stanford University, 2009

  3. [3]

    Homomorphic encryption for arithmetic of approximate numbers,

    J. H. Cheon, A. Kim, M. Kim, and Y . Song, “Homomorphic encryption for arithmetic of approximate numbers,” inProc. ASIACRYPT, 2017, pp. 409–437

  4. [4]

    Machine learning classification over encrypted data,

    R. Bost, R. A. Popa, S. Tu, and S. Goldwasser, “Machine learning classification over encrypted data,” inProc. NDSS, 2015

  5. [5]

    SecureML: A system for scalable privacy-preserving machine learning,

    P. Mohassel and Y . Zhang, “SecureML: A system for scalable privacy-preserving machine learning,” inProc. IEEE S&P, 2017, pp. 19–38

  6. [6]

    Oblivious neural network predictions via MiniONN transformations,

    J. Liu, M. Juuti, Y . Lu, and N. Asokan, “Oblivious neural network predictions via MiniONN transformations,” inProc. ACM CCS, 2017, pp. 619–631

  7. [7]

    CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy,

    R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, and J. Wernsing, “CryptoNets: Applying neural networks to encrypted data with high throughput and accuracy,” inProc. ICML, PMLR, vol. 48, pp. 201–210, 2016

  8. [8]

    Low latency privacy preserving inference,

    A. Brutzkus, R. Gilad-Bachrach, and O. Elisha, “Low latency privacy preserving inference,” inProc. ICML, 2019, pp. 812–821

  9. [9]

    CHET: an optimizing compiler for fully-homomorphic neural- network inferencing,

    R. Dathathriet al., “CHET: an optimizing compiler for fully-homomorphic neural- network inferencing,” inProc. ACM PLDI, 2019, pp. 142–156

  10. [10]

    EV A: An encrypted vector arithmetic language and compiler for efficient homomorphic computation,

    R. Dathathriet al., “EV A: An encrypted vector arithmetic language and compiler for efficient homomorphic computation,” inProc. ACM PLDI, 2020, pp. 546–561

  11. [11]

    nGraph-HE2: A high-throughput framework for neural network inference on encrypted data,

    F. Boemer, A. Costache, R. Cammarota, and C. Wierzynski, “nGraph-HE2: A high-throughput framework for neural network inference on encrypted data,” in Proc. WAHC, 2019, pp. 45–56

  12. [12]

    GAZELLE: A low latency framework for secure neural network inference,

    C. Juvekar, V . Vaikuntanathan, and A. Chandrakasan, “GAZELLE: A low latency framework for secure neural network inference,” inProc. USENIX Security, 2018, pp. 1651–1669

  13. [13]

    DELPHI: A cryptographic inference service for neural networks,

    P. Mishraet al., “DELPHI: A cryptographic inference service for neural networks,” inProc. USENIX Security, 2020, pp. 2505–2522

  14. [14]

    CrypTFlow2: Practical 2-party secure inference,

    D. Ratheeet al., “CrypTFlow2: Practical 2-party secure inference,” inProc. ACM CCS, 2020, pp. 325–342

  15. [15]

    XONN: XNOR-based oblivious deep neural network inference,

    M. S. Riaziet al., “XONN: XNOR-based oblivious deep neural network inference,” inProc. USENIX Security, 2019, pp. 1501–1518

  16. [16]

    Cheetah: Lean and fast secure two-party deep neural network inference,

    Z. Huanget al., “Cheetah: Lean and fast secure two-party deep neural network inference,” inProc. USENIX Security, 2022, pp. 809–826

  17. [17]

    L. N. Trefethen,Approximation Theory and Approximation Practice. SIAM, 2013

  18. [18]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge Univ. Press, 2004

  19. [19]

    CryptoDL: Deep Neural Networks over Encrypted Data

    E. Hesamifard, H. Takabi, and M. Ghasemi, “CryptoDL: Deep neural networks over encrypted data,” arXiv:1711.05189, 2017

  20. [20]

    Precise approximation of convolutional neural networks for homomorphically encrypted data,

    J. Lee, E. Lee, J.-W. Lee, Y . Kim, Y .-S. Kim, and J.-S. No, “Precise approximation of convolutional neural networks for homomorphically encrypted data,”IEEE Access, vol. 11, pp. 62062–62076, 2023, doi: 10.1109/ACCESS.2023.3287564

  21. [21]

    Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,

    J. Lee, E. Lee, Y .-S. Kim, Y . Lee, J.-W. Lee, Y . Kim, and J.-S. No, “Optimized layerwise approximation for efficient private inference on fully homomorphic encryption,” arXiv:2310.10349v4, 2025

  22. [22]

    SAFENet: A secure, accurate, and fast neural network inference,

    Q. Lou, Y . Shen, H. Jin, and L. Jiang, “SAFENet: A secure, accurate, and fast neural network inference,” inProc. ICLR, 2021

  23. [23]

    AutoFHE: Automated adaption of CNNs for efficient evaluation over FHE,

    W. Ao and V . N. Boddeti, “AutoFHE: Automated adaption of CNNs for efficient evaluation over FHE,” inProc. USENIX Security, 2024, pp. 2173–2190

  24. [24]

    Explaining and harnessing adversarial examples,

    I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” inProc. ICLR, 2015

  25. [25]

    Support-vector networks,

    C. Cortes and V . Vapnik, “Support-vector networks,”Machine Learning, vol. 20, no. 3, pp. 273–297, 1995

  26. [26]

    Duality and geometry in SVM classifiers,

    K. P. Bennett and E. J. Bredensteiner, “Duality and geometry in SVM classifiers,” inProc. ICML, 2000, pp. 57–64

  27. [27]

    MLP-Mixer: An all-MLP architecture for vision,

    I. O. Tolstikhinet al., “MLP-Mixer: An all-MLP architecture for vision,” inProc. NeurIPS, 2021

  28. [28]

    ResMLP: Feedforward networks for image classification with data-efficient training,

    H. Touvronet al., “ResMLP: Feedforward networks for image classification with data-efficient training,”IEEE TPAMI, vol. 45, no. 4, pp. 5314–5321, 2023

  29. [29]

    Graph-less neural networks: Teaching old MLPs new tricks via distillation,

    S. Zhang, Y . Liu, Y . Sun, and N. Shah, “Graph-less neural networks: Teaching old MLPs new tricks via distillation,” inProc. ICLR, 2022

  30. [30]

    DINOv2: Learning robust visual features without supervision,

    M. Oquabet al., “DINOv2: Learning robust visual features without supervision,” Trans. Mach. Learn. Res. (TMLR), 2024

  31. [31]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Y . Zhang, M. Li, D. Long, X. Zhang, H. Lin, B. Yang, P. Xie, A. Yang, D. Liu, J. Lin, F. Huang, and J. Zhou, “Qwen3 Embedding: Advancing text embedding and reranking through foundation models,” arXiv:2506.05176v3, 2025