Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM

Amirali Boroumand; Donghyuk Lee; Hasan Hassan; Jeremie Kim; Michael A. Kozuch; Onur Mutlu; Phillip B. Gibbons; Thomas Mullins; Todd C. Mowry; Vivek Seshadri

arxiv: 1611.09988 · v1 · pith:7OZ3SWBSnew · submitted 2016-11-30 · 💻 cs.AR

Buddy-RAM: Improving the Performance and Efficiency of Bulk Bitwise Operations Using DRAM

Vivek Seshadri , Donghyuk Lee , Thomas Mullins , Hasan Hassan , Amirali Boroumand , Jeremie Kim , Michael A. Kozuch , Onur Mutlu

show 2 more authors

Phillip B. Gibbons Todd C. Mowry

This is my paper

classification 💻 cs.AR

keywords bitwiseoperationsdrambuddybulkperformbitmapchip

0 comments

read the original abstract

Bitwise operations are an important component of modern day programming. Many widely-used data structures (e.g., bitmap indices in databases) rely on fast bitwise operations on large bit vectors to achieve high performance. Unfortunately, in existing systems, regardless of the underlying architecture (e.g., CPU, GPU, FPGA), the throughput of such bulk bitwise operations is limited by the available memory bandwidth. We propose Buddy, a new mechanism that exploits the analog operation of DRAM to perform bulk bitwise operations completely inside the DRAM chip. Buddy consists of two components. First, simultaneous activation of three DRAM rows that are connected to the same set of sense amplifiers enables us to perform bitwise AND and OR operations. Second, the inverters present in each sense amplifier enables us to perform bitwise NOT operations, with modest changes to the DRAM array. These two components make Buddy functionally complete. Our implementation of Buddy largely exploits the existing DRAM structure and interface, and incurs low overhead (1% of DRAM chip area). Our evaluations based on SPICE simulations show that, across seven commonly-used bitwise operations, Buddy provides between 10.9X---25.6X improvement in raw throughput and 25.1X---59.5X reduction in energy consumption. We evaluate three real-world data-intensive applications that exploit bitwise operations: 1) bitmap indices, 2) BitWeaving, and 3) bitvector-based implementation of sets. Our evaluations show that Buddy significantly outperforms the state-of-the-art.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding
cs.AR 2026-06 unverdicted novelty 7.0

Clutch accelerates vector-scalar comparisons in PuD systems via chunked temporal coding, delivering 2.9x throughput and 3.0x energy gains over prior bit-serial PuD while also mapping decision tree inference to PuD for...
PuDGhost: Experimental Analysis of Computation Result Corruption in Processing-using-DRAM Operations on Real DRAM Chips and Implications for Future Systems
cs.AR 2026-06 unverdicted novelty 7.0

PuDGhost causes up to 48% error in SiMRA-based PuD computations due to row and column interference, quantified on 96 real DDR4 chips with proposed mitigations like column screening and row layout changes.
HE-PIM: Demystifying Homomorphic Operations on a Real-world Processing-in-Memory System
cs.CR 2026-05 accept novelty 7.0

Characterization of HE kernels on commercial UPMEM PIM identifies modular multiplication and per-bank capacity as dominant bottlenecks and concludes PIM becomes competitive with CPU/GPU once those are addressed.