DRAMatic Speedup: Accelerating HE Operations on a Processing-in-Memory System
Pith reviewed 2026-05-16 04:58 UTC · model grok-4.3
The pith
DRAMatic runs foundational homomorphic encryption operations 334 times faster on UPMEM processing-in-memory hardware than prior PIM implementations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DRAMatic implements the basic arithmetic of homomorphic encryption on UPMEM PIM by combining residue number system encoding with number-theoretic transforms, delivering a 334 times speedup relative to previous PIM-based HE while narrowing the gap to conventional libraries such as SEAL, subject to remaining data-transfer and multiplication bottlenecks.
What carries the argument
Residue number system combined with number-theoretic transforms, executed across the parallel processing units embedded in UPMEM PIM memory modules.
If this is right
- Foundational HE operations become substantially faster when executed on PIM hardware.
- DRAMatic reduces the runtime and energy gap between PIM implementations and standard libraries such as SEAL.
- Data-transfer overhead and limited multiplication throughput currently constrain further gains on UPMEM PIM.
- Hardware extensions to UPMEM PIM could improve support for the large parameters and arithmetic patterns of HE.
Where Pith is reading between the lines
- PIM architectures could become practical for confidential cloud workloads if data movement costs are lowered.
- The same residue-number and transform optimizations may accelerate other memory-bound cryptographic tasks beyond HE.
- Future PIM designs with faster on-module multiplication units would better match the needs of polynomial-based encryption schemes.
Load-bearing premise
The UPMEM PIM hardware can support the large parameters required for secure homomorphic evaluations with acceptable data-transfer overhead.
What would settle it
Measure end-to-end runtime and energy for a complete secure homomorphic evaluation of a non-trivial circuit on UPMEM PIM using DRAMatic versus the identical circuit run with SEAL on a standard CPU.
read the original abstract
Homomorphic encryption (HE) is a promising technology for confidential cloud computing, as it allows computations on encrypted data. However, HE is computationally expensive and often memory-bound on conventional computer architectures. Processing-in-Memory (PIM) is an alternative hardware architecture that integrates processing units and memory on the same chip or memory module. PIM enables higher memory bandwidth than conventional architectures and could thus be suitable for accelerating HE. We present DRAMatic, which implements operations foundational to HE on UPMEM PIM -- a programmable general-purpose PIM system developed by UPMEM. DRAMatic incorporates many arithmetic optimizations, including residue number system and number-theoretic transform techniques, and can support the large parameters required for secure homomorphic evaluations. It achieves a 334 times speed-up compared to previous HE implementations on UPMEM PIM. We also evaluate DRAMatic against Microsoft SEAL, a popular open-source HE library, regarding both runtime and energy efficiency. The results show that DRAMatic significantly closes the gap between Microsoft SEAL and HE implementations on UPMEM PIM. However, we also show that DRAMatic is currently constrained by data transfer overhead and limited multiplication performance on UPMEM PIM hardware. Finally, we discuss potential hardware extensions to UPMEM PIM.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents DRAMatic, an implementation of foundational homomorphic encryption operations on UPMEM PIM hardware. It incorporates RNS and NTT optimizations to support large secure parameters, reports a 334x speedup over prior HE implementations on the same platform, and compares runtime and energy efficiency against Microsoft SEAL, while noting constraints from data-transfer overhead and limited multiplication performance.
Significance. If the performance numbers hold with full inclusion of overheads, the work demonstrates that PIM can accelerate memory-bound HE workloads and narrows the gap to conventional libraries such as SEAL, offering a concrete data point for hardware-software co-design in confidential computing.
major comments (1)
- [Abstract] Abstract: the headline 334x speedup versus previous UPMEM PIM HE implementations is presented without a breakdown of compute time versus host-to-PIM data-transfer time for the evaluated parameter sets. Because the abstract itself identifies data transfer overhead as a binding constraint, it is unclear whether the reported figure already folds transfers in or whether they remain negligible; this directly affects whether the central acceleration claim is supported.
minor comments (1)
- The experimental section should include error bars, standard deviations, or raw timing data for all runtime and energy figures to permit assessment of measurement variability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the single major comment below and will revise the paper to improve clarity on the reported speedup.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline 334x speedup versus previous UPMEM PIM HE implementations is presented without a breakdown of compute time versus host-to-PIM data-transfer time for the evaluated parameter sets. Because the abstract itself identifies data transfer overhead as a binding constraint, it is unclear whether the reported figure already folds transfers in or whether they remain negligible; this directly affects whether the central acceleration claim is supported.
Authors: We agree that the abstract would benefit from greater precision on this point. The 334x speedup is measured for the full end-to-end HE operations on UPMEM PIM, which includes both on-PIM computation and the host-to-PIM data transfers required to move operands and results. This measurement approach matches the methodology used in the prior UPMEM HE implementations we compare against, ensuring an apples-to-apples comparison. To eliminate any ambiguity, we will revise the abstract to explicitly state that the reported speedup incorporates data-transfer overhead. We will also add a table or figure in the evaluation section that breaks down compute time versus transfer time for the evaluated parameter sets, directly supporting the central claim. revision: yes
Circularity Check
No significant circularity: empirical implementation with direct runtime measurements
full rationale
The paper is an engineering report on implementing HE operations (NTT, RNS, etc.) on UPMEM PIM hardware. It reports measured speedups (334x vs prior UPMEM HE) and comparisons to SEAL via direct execution timings and energy figures. No mathematical derivations, fitted parameters, or predictions are presented; the central claims rest on hardware benchmarks against external baselines. The abstract notes data-transfer constraints explicitly, confirming the work does not hide or redefine its own inputs as outputs. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing steps. This is a standard non-circular implementation paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption UPMEM PIM hardware provides substantially higher effective memory bandwidth than conventional CPU/GPU architectures for the access patterns of HE arithmetic.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.