A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

arxiv: 2604.04773 · v1 · submitted 2026-04-06 · 💻 cs.AR

A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

Siddhartha Raman Sundara Raman , Siyuan Ma , Lizy Kurian John This is my paper

Pith reviewed 2026-05-10 18:53 UTC · model grok-4.3

classification 💻 cs.AR

keywords compute-in-memoryDRAM PIMpower delivery networkvoltage droopPDN challengesnear-bank computingmemory architecturemitigation strategies

0 comments p. Extension

The pith

DRAM-based compute-in-memory creates non-traditional current demands that require power delivery network aware designs for reliable scaling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper surveys power delivery challenges in DRAM-based compute-in-memory systems. It introduces a taxonomy that sorts current behaviors by whether they are bursty or sustained over time and localized or distributed across space. This framework shows how techniques such as multi-row activation, row-buffer operations, and near-bank compute units produce voltage droops, IR drops, and thermal hotspots. The survey then examines mitigation approaches that reuse existing DRAM mechanisms including timing constraints, controller scheduling, data placement, and hierarchical power management. A sympathetic reader cares because these issues determine whether PIM systems can scale without reliability failures in real hardware.

Core claim

By classifying PIM-induced current patterns along temporal (burst versus sustained) and spatial (localized versus distributed) dimensions, the paper shows that representative DRAM PIM mechanisms stress the power delivery network through concurrent activations and large-scale parallel execution, producing voltage droop, IR drop, and thermal hotspots. It argues that DRAM-specific mitigations drawn from architectural timing, memory controller scheduling, data placement, and bank- and vault-level power management can address these stresses, establishing that PDN-aware design is necessary for scalable and reliable DRAM-based PIM systems.

What carries the argument

A unified taxonomy that classifies PIM-induced current behavior along temporal (burst vs. sustained) and spatial (localized vs. distributed) dimensions to map techniques to their PDN stresses.

Load-bearing premise

The representative PIM techniques surveyed capture the main current-demand patterns that will appear in future DRAM-based PIM deployments.

What would settle it

A measurement or simulation of a large-scale DRAM PIM system using multi-row activation and near-bank compute that runs heavy parallel workloads without producing significant voltage droops, IR drops, or thermal hotspots.

Figures

Figures reproduced from arXiv: 2604.04773 by Lizy Kurian John, Siddhartha Raman Sundara Raman, Siyuan Ma.

read the original abstract

Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency. DRAM-based PIM is particularly attractive due to its high density, mature manufacturing ecosystem, and compatibility with existing systems. Recent works exploit multiple levels of the DRAM hierarchy - including subarrays, banks, and 3D-stacked organizations - to enable in-memory computation using mechanisms such as multi-row activation, row-buffer operations, and near-bank compute units. However, these approaches introduce non-traditional current demand patterns that challenge the power delivery network (PDN). This paper surveys PDN challenges in DRAM-based PIM systems and proposes a unified taxonomy that characterizes PIM-induced current behavior along temporal (burst vs. sustained) and spatial (localized vs. distributed) dimensions. Using this framework, we analyze how representative PIM techniques stress the PDN through bursty activations, multi-row concurrency, and large-scale parallel execution, leading to voltage droop, IR drop, and thermal hotspots. We further discuss DRAM-specific mitigation strategies leveraging existing architectural and circuit-level mechanisms, including timing constraints, memory controller scheduling, data placement, and bank- and vault-level power management. This survey highlights the importance of PDN-aware design for scalable and reliable DRAM-based PIM systems and outlines key future research directions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a survey that organizes existing DRAM PIM work around a temporal-spatial current taxonomy but adds no new measurements or derivations of its own.

read the letter

The main point is that this paper surveys power delivery network issues in DRAM-based compute-in-memory and proposes a simple taxonomy to classify current demands as burst versus sustained and localized versus distributed. It uses that frame to discuss how techniques like multi-row activation, row-buffer ops, and near-bank compute create voltage droop, IR drop, and thermal problems, then lists standard mitigations such as controller scheduling, timing constraints, and bank-level power management. The taxonomy itself is the clearest new element, as a synthesis that pulls scattered observations into one grid. The paper does a clean job of connecting the representative mechanisms to PDN stress and of pointing out that existing DRAM features can help without requiring entirely new hardware. The discussion of future directions stays practical and focused on data placement and vault-level controls. The soft spots are straightforward and proportional. The analysis stays qualitative with no fresh simulations, quantitative summaries, or error bars from the cited works, so the severity of the claimed problems rests on whatever the prior papers already showed. The three example techniques may not exhaust the space of possible current patterns, especially if future PIM designs use different subarray concurrency or vault organizations that fall outside the four quadrants. Without a coverage argument or additional cases, the taxonomy and mitigation recommendations could turn out incomplete for some deployments. This paper is for engineers and researchers already working on memory systems or AI accelerators who need a quick map of the power issues rather than a primary result. It can save time for someone entering the area but will not replace reading the original measurements. It deserves a serious referee because the taxonomy is a reasonable organizing tool, the citations appear solid, and the topic matters for energy-efficient designs, even though the contribution is organizational rather than empirical.

Referee Report

1 major / 2 minor

Summary. This survey paper examines power delivery network (PDN) challenges arising in DRAM-based compute-in/near-memory (PIM) systems. It proposes a taxonomy that classifies PIM-induced current demands along temporal (burst vs. sustained) and spatial (localized vs. distributed) dimensions. The authors apply the taxonomy to analyze representative techniques—multi-row activation, row-buffer operations, and near-bank compute—and the resulting stresses including voltage droop, IR drop, and thermal hotspots. The manuscript reviews mitigation approaches based on timing constraints, memory-controller scheduling, data placement, and bank/vault-level power management, and concludes by stressing the need for PDN-aware design in scalable DRAM-PIM systems while listing future research directions.

Significance. If the taxonomy proves robust, the survey could serve as a useful organizing lens for designers and researchers working on DRAM-PIM, encouraging earlier consideration of power-delivery constraints. Its primary contribution is synthesis of existing literature rather than new quantitative results or proofs; therefore its impact will depend on how comprehensively and accurately it maps the space of current-demand patterns.

major comments (1)

The central claim that the proposed taxonomy enables characterization of PDN stresses for scalable DRAM-PIM systems rests on the representativeness of the three chosen mechanisms (multi-row activation, row-buffer operations, near-bank compute). No coverage metric, exhaustive enumeration of alternative organizations (e.g., subarray-level logic or different vault configurations), or argument that these techniques exhaust the relevant current-signature space is supplied. This assumption is load-bearing for the taxonomy's claimed utility and for the derived mitigation recommendations.

minor comments (2)

The abstract and introduction would benefit from an explicit statement of how many PIM techniques are analyzed and which sections contain the detailed mapping onto the taxonomy quadrants.
Several mitigation strategies are described qualitatively; adding even brief pointers to quantitative results or simulation data from the cited works would improve clarity without altering the survey nature of the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our survey. We address the single major comment below, acknowledging the gap in explicit coverage discussion while strengthening the manuscript's presentation of the taxonomy's scope.

read point-by-point responses

Referee: The central claim that the proposed taxonomy enables characterization of PDN stresses for scalable DRAM-PIM systems rests on the representativeness of the three chosen mechanisms (multi-row activation, row-buffer operations, near-bank compute). No coverage metric, exhaustive enumeration of alternative organizations (e.g., subarray-level logic or different vault configurations), or argument that these techniques exhaust the relevant current-signature space is supplied. This assumption is load-bearing for the taxonomy's claimed utility and for the derived mitigation recommendations.

Authors: We agree that the manuscript does not supply an exhaustive enumeration, coverage metric, or formal argument that the three mechanisms exhaust the current-signature space. These techniques were selected because they are prominent in the surveyed literature and map distinctly onto the taxonomy axes (multi-row activation as burst-localized, row-buffer operations as sustained-distributed, and near-bank compute as high-parallelism). In the revised version we will add a new subsection following the taxonomy definition that (1) states the selection rationale, (2) explicitly lists example alternative organizations such as subarray-level logic and varied HBM vault configurations, and (3) clarifies that the taxonomy is offered as a general organizing lens rather than a complete enumeration. This makes the scope and limitations transparent while preserving the survey's synthesis contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: survey taxonomy and analysis are self-contained organizational contributions

full rationale

This paper is a survey that proposes a taxonomy along temporal and spatial dimensions to characterize current-demand patterns in DRAM-based PIM and then applies it to representative techniques drawn from prior literature. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claim that PDN-aware design is important rests on qualitative analysis of existing mechanisms rather than any reduction to quantities defined by the paper's own inputs or self-citations. The taxonomy is presented as a new organizational lens, not as a result forced by or equivalent to the surveyed data. External citations are to independent prior works and do not form a load-bearing self-citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

As a survey the paper rests on standard computer-architecture assumptions about DRAM hierarchy and power delivery networks; no free parameters, ad-hoc axioms, or invented entities are introduced in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 1094 out tokens · 38319 ms · 2026-05-10T18:53:26.408278+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

subarray-level PIM... bank-level PIM... 3D level PIM... mitigation strategies leveraging existing architectural and circuit-level mechanisms

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks
cs.AR 2026-05 unverdicted novelty 5.0

NEM-GNN is a scalable DAC/ADC-less processing-in-memory architecture for GNNs that uses early compute termination, reconfigurable SoC pre-computation, and compute-as-soon-as-ready broadcast execution to deliver large ...

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

A scalable processing-in-memory accelerator for parallel graph processing,

J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “A scalable processing-in-memory accelerator for parallel graph processing,” inProceedings of the 42nd International Symposium on Computer Architecture (ISCA), 2015, pp. 105–117

work page 2015
[2]

Subarray-aware scheduling for pim systems,

Anonymous, “Subarray-aware scheduling for pim systems,” 2024

work page 2024
[3]

Toward energy-efficient stt-mram-based near memory computing design for embedded systems,

K. Asifuzzamanet al., “Toward energy-efficient stt-mram-based near memory computing design for embedded systems,”ACM Journal on Emerging Technolo- gies in Computing Systems, 2026

work page 2026
[4]

15.6 e-chimera: A scalable sram-based ising macro with enhanced-chimera topology for solving combinatorial optimization problems within memory,

J. Bae, C. Shim, and B. Kim, “15.6 e-chimera: A scalable sram-based ising macro with enhanced-chimera topology for solving combinatorial optimization problems within memory,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), vol. 67, 2024, pp. 286–288

work page 2024
[5]

Conv-sram: An energy-efficient sram with in- memory dot-product computation for low-power convolutional neural networks,

A. Biswas and A. P. Chandrakasan, “Conv-sram: An energy-efficient sram with in- memory dot-product computation for low-power convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 217–230, 2019

work page 2019
[6]

Siddhartha Raman, H

P. K. R. Boppidi, S. S. Raman, H. Renuka, and S. Kundu, “Pt/Cu:ZnO/Nb:STO memristive dual port for cache memory applications,”AIP Conference Proceedings, vol. 2265, no. 1, p. 030212, 11 2020. [Online]. Available: https://doi.org/10.1063/5.0016597

work page doi:10.1063/5.0016597 2020
[7]

Neuromorphic computing with pcm-based crossbar arrays,

I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y . Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic computing with pcm-based crossbar arrays,”Nature Communications, vol. 9, no. 1, p. 2514, 2018

work page 2018
[8]

Drampower: Open-source dram power & energy estimation tool,

K. Chandrasekar, C. Weis, Y . Li, B. Akesson, O. Naji, M. Jung, N. Wehn, and K. Goossens, “Drampower: Open-source dram power & energy estimation tool,” inProceedings of the 2012 IEEE International Conference on High Performance Computing and Simulation (HPCS), 2012, pp. 64–69

work page 2012
[9]

Understanding reduced-voltage operation in modern dram devices,

K. K.-W. Chang and et al., “Understanding reduced-voltage operation in modern dram devices,” inSIGMETRICS, 2017

work page 2017
[10]

Prime: A novel processing-in-memory architecture for neural network computation in rram-based main memory,

P. Chi, S. Li, C. Xu, T. Zhang, J. Gu, W. Jiang, X. Zhang, and Y . Xie, “Prime: A novel processing-in-memory architecture for neural network computation in rram-based main memory,” inProceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 27–39

work page 2016
[11]

A survey of processing-in-memory: From fundamentals to real- world case studies,

F. Gaoet al., “A survey of processing-in-memory: From fundamentals to real- world case studies,”arXiv preprint arXiv:2105.03814, 2021

work page arXiv 2021
[12]

Computedram: In-memory compute using off-the-shelf drams,

F. Gao, G. Tziantzioulis, and D. Wentzlaff, “Computedram: In-memory compute using off-the-shelf drams,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52. New York, NY , USA: Association for Computing Machinery, 2019, p. 100–113. [Online]. Available: https://doi.org/10.1145/3352460.3358260

work page doi:10.1145/3352460.3358260 2019
[13]

Tetris: Scalable and efficient neural network acceleration with 3d memory,

M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “Tetris: Scalable and efficient neural network acceleration with 3d memory,” inProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017, pp. 751–764

work page 2017
[14]

What your dram power models are not telling you: Lessons from a detailed experimental study,

S. Ghose and et al., “What your dram power models are not telling you: Lessons from a detailed experimental study,”Proc. ACM Meas. Anal. Comput. Syst., vol. 2, no. 3, p. 24, 2018

work page 2018
[15]

What your dram power models are not telling you: Lessons from a detailed experimental study,

S. Ghose, A. G. Ya ˘glıkc ¸ı, R. Guptaet al., “What your dram power models are not telling you: Lessons from a detailed experimental study,”Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), vol. 2, no. 3, pp. 1–28, 2018

work page 2018
[16]

Newton: A DRAM-maker’s accelerator-in-memory (AiM) architec- ture for machine learning,

M. Heet al., “Newton: A DRAM-maker’s accelerator-in-memory (AiM) architec- ture for machine learning,” in53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 834–847

work page 2020
[17]

Ddr4 sdram standard (jesd79-4),

JEDEC Solid State Technology Association, “Ddr4 sdram standard (jesd79-4),” JEDEC, Tech. Rep., 2012. [Online]. Available: https://jedec.org

work page 2012
[18]

Neuromorphic computing with nanoscale resistive switching memory devices,

A. Jeyasothy and et al., “Neuromorphic computing with nanoscale resistive switching memory devices,”Nature Electronics, vol. 4, pp. 81–90, 2021

work page 2021
[19]

Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,

D.-H. Kim, J. Kung, S.-H. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,” inProceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 380–392

work page 2016
[20]

A case for exploiting subarray- level parallelism (salp) in dram,

Y . Kim, V . Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray- level parallelism (salp) in dram,” inISCA, 2012

work page 2012
[21]

Unconventional computing using ising accelerators,

J. P. Kulkarni, S. R. Sundara Raman, S. Xie, and C.-P. Lo, “Unconventional computing using ising accelerators,”Computer, vol. 58, no. 6, pp. 83–86, 2025

work page 2025
[22]

Hardware architecture and software stack for pim based on commercial dram technology: Industrial product,

S. Lee, S.-h. Kang, J. Lee, H. Kim, E. Lee, S.-y. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, S. O, A. Iyer, D. Wang, K. Sohn, and N. S. Kim, “Hardware architecture and software stack for pim based on commercial dram technology: Industrial product,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 43–56

work page 2021
[23]

Modeling and compensation of ir drop in crosspoint accelerators of neural networks,

N. Lepri, M. Baldo, P. Mannocci, A. Glukhov, V . Milo, and D. Ielmini, “Modeling and compensation of ir drop in crosspoint accelerators of neural networks,”IEEE Transactions on Electron Devices, vol. 69, no. 3, pp. 1575–1581, 2022

work page 2022
[24]

Analogue signal and image processing with large-scale rram crossbars,

C. Li, D. Belkin, Y . Li, P. Yan, M. Hu, N. Ge, H. Sheng, H. Chang, C. Pao, J. M. Linet al., “Analogue signal and image processing with large-scale rram crossbars,”Nature Electronics, vol. 1, no. 1, pp. 52–59, 2018

work page 2018
[25]

Drisa: A dram-based reconfigurable in-situ accelerator,

S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y . Xie, “Drisa: A dram-based reconfigurable in-situ accelerator,” in2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 288–301

work page 2017
[26]

Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems,

O. Mutlu and T. Moscibroda, “Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems,” inProceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA ’08. IEEE Computer Society, 2008, pp. 63–74

work page 2008
[27]

Phase transition material-assisted low-power sram design,

S. S. T. Nibhanupudi, S. R. S. Raman, and J. P. Kulkarni, “Phase transition material-assisted low-power sram design,”IEEE Transactions on Electron De- vices, vol. 68, no. 5, pp. 2281–2288, 2021

work page 2021
[28]

Ultra-low-voltage utbb-soi-based, pseudo-static storage circuits for cryogenic cmos applications,

S. S. T. Nibhanupudi, S. R. Sundara Raman, M. Cass ´e, L. Hutin, and J. P. Kulkarni, “Ultra-low-voltage utbb-soi-based, pseudo-static storage circuits for cryogenic cmos applications,”IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 7, no. 2, pp. 201–208, 2021

work page 2021
[29]

Thermal-aware scheduling for 3d-stacked memory systems,

A. G. Pavlidis and S. Memik, “Thermal-aware scheduling for 3d-stacked memory systems,”IEEE Transactions on Computers, 2023

work page 2023
[30]

A review on non-volatile and volatile emerging memory technologies,

S. R. S. Raman, “A review on non-volatile and volatile emerging memory technologies,” inComputer Memory and Data Storage, A. Seyedi, Ed. Rijeka: IntechOpen, 2024, ch. 3. [Online]. Available: https://doi.org/10.5772/intechopen.110617

work page doi:10.5772/intechopen.110617 2024
[31]

Spark: Sparsity aware, low area, energy-efficient, near-memory architecture for accelerating linear programming problems,

S. R. S. Raman, L. John, and J. P. Kulkarni, “Spark: Sparsity aware, low area, energy-efficient, near-memory architecture for accelerating linear programming problems,” in2025 IEEE International Symposium on High Performance Com- puter Architecture (HPCA), 2025, pp. 99–112

work page 2025
[32]

S. R. S. Raman and J. P. Kulkarni, “Abi: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache gpu architecture with light-weight softmax for deep learning, linear algebra, and ising compute,”

work page
[33]

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

[Online]. Available: https://arxiv.org/abs/2602.14262

work page internal anchor Pith review Pith/arXiv arXiv
[34]

Threshold selector and capacitive coupled assist techniques for write voltage reduction in metal–ferroelectric–metal field-effect transistor,

S. R. S. Raman, S. S. T. Nibhanupudi, A. K. Saha, S. Gupta, and J. P. Kulkarni, “Threshold selector and capacitive coupled assist techniques for write voltage reduction in metal–ferroelectric–metal field-effect transistor,”IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6132–6138, 2021

work page 2021
[35]

High noise margin, digital logic design using josephson junction field-effect transistors for cryogenic computing,

S. R. S. Raman, F. Wen, R. Pillarisetty, V . De, and J. P. Kulkarni, “High noise margin, digital logic design using josephson junction field-effect transistors for cryogenic computing,”IEEE Transactions on Applied Superconductivity, vol. 31, no. 5, pp. 1–5, 2021

work page 2021
[36]

Compute-in-edram with backend integrated indium gallium zinc oxide transistors,

S. R. S. Raman, S. Xie, and J. P.Kulkarni, “Compute-in-edram with backend integrated indium gallium zinc oxide transistors,” in2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5

work page 2021
[37]

Computing in memory with fefets,

D. Reis, M. Niemier, and X. S. Hu, “Computing in memory with fefets,” inProc. Int. Symp. Low Power Electron. Design, 2018, pp. 1–6

work page 2018
[38]

Memory access scheduling,

S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory access scheduling,” inProceedings of the 27th Annual International Symposium on Computer Architecture, ser. ISCA ’00. ACM, 2000, pp. 128–138

work page 2000
[39]

Computational phase-change memory: beyond von neumann computing,

A. Sebastian, M. Le Gallo, G. W. Burr, P. Narayan, I. Boybat, M. L. Gallo, S. R. Nandakumar, T. Tuma, and E. Eleftheriou, “Computational phase-change memory: beyond von neumann computing,”Journal of Applied Physics, vol. 126, no. 15, p. 151101, 2019

work page 2019
[40]

Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,

V . Seshadri, Y . Kim, C. Fallin, D. Lee, R. Radojkovic, G. Boggs, T. Mudge, D. Burger, T. C. Mowry, and O. Mutlu, “Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,” inProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 185–197

work page 2013
[41]

Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,

V . Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,” inProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, Cambridge, MA, USA, October 14-18, 2017...

work page 2017
[42]

Understanding and improving dram performance,

V . S. Seshadri, “Understanding and improving dram performance,” Ph.D. disser- tation, Carnegie Mellon University, 2015

work page 2015
[43]

30.1 a 40nm vliw edge accelerator with 5mb of 0.256 pj/b rram and a localization solver for bristle robot surveillance,

S. D. e. a. Spetalnick, “30.1 a 40nm vliw edge accelerator with 5mb of 0.256 pj/b rram and a localization solver for bristle robot surveillance,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), 2024, pp. 1–3

work page 2024
[44]

Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks,

S. R. Sundara Raman, L. John, and J. P. Kulkarni, “Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks,”ACM Trans. Archit. Code Optim., vol. 21, no. 2, May

work page
[45]

Kulkarni

[Online]. Available: https://doi.org/10.1145/3652607

work page doi:10.1145/3652607
[46]

Sachi: A stationarity- aware, all-digital, near-memory, ising architecture,

S. R. Sundara Raman, L. K. John, and J. P. Kulkarni, “Sachi: A stationarity- aware, all-digital, near-memory, ising architecture,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024, pp. 719– 731

work page 2024
[47]

Enabling in- memory computations in non-volatile sram designs,

S. R. Sundara Raman, S. S. T. Nibhanupudi, and J. P. Kulkarni, “Enabling in- memory computations in non-volatile sram designs,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 12, no. 2, pp. 557–568, 2022

work page 2022
[48]

Igzo cim: Enabling in- memory computations using multilevel capacitorless indium–gallium–zinc–oxide- based embedded dram technology,

S. R. Sundara Raman, S. Xie, and J. P. Kulkarni, “Igzo cim: Enabling in- memory computations using multilevel capacitorless indium–gallium–zinc–oxide- based embedded dram technology,”IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 8, no. 1, pp. 35–43, 2022

work page 2022
[49]

Understanding the energy consumption of dynamic random access memories,

D. V ogelsang, “Understanding the energy consumption of dynamic random access memories,”IEEE Micro, vol. 30, no. 1, pp. 26–34, 2010

work page 2010
[50]

A compute-in-memory chip based on resistive random-access memory,

W. Wan, R. Kubendran, C. Schaefer, S. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. Wong, and G. Cauwenberghs, “A compute-in-memory chip based on resistive random-access memory,”Nature, vol. 608, pp. 504–512, 08 2022

work page 2022
[51]

Ising-cim: A reconfigurable and scalable compute within memory analog ising accelerator for solving combinatorial optimization problems,

S. Xie, S. R. S. Raman, C. Ni, M. Wang, M. Yang, and J. P. Kulkarni, “Ising-cim: A reconfigurable and scalable compute within memory analog ising accelerator for solving combinatorial optimization problems,”IEEE Journal of Solid-State Circuits, pp. 1–13, 2022

work page 2022

[1] [1]

A scalable processing-in-memory accelerator for parallel graph processing,

J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “A scalable processing-in-memory accelerator for parallel graph processing,” inProceedings of the 42nd International Symposium on Computer Architecture (ISCA), 2015, pp. 105–117

work page 2015

[2] [2]

Subarray-aware scheduling for pim systems,

Anonymous, “Subarray-aware scheduling for pim systems,” 2024

work page 2024

[3] [3]

Toward energy-efficient stt-mram-based near memory computing design for embedded systems,

K. Asifuzzamanet al., “Toward energy-efficient stt-mram-based near memory computing design for embedded systems,”ACM Journal on Emerging Technolo- gies in Computing Systems, 2026

work page 2026

[4] [4]

15.6 e-chimera: A scalable sram-based ising macro with enhanced-chimera topology for solving combinatorial optimization problems within memory,

J. Bae, C. Shim, and B. Kim, “15.6 e-chimera: A scalable sram-based ising macro with enhanced-chimera topology for solving combinatorial optimization problems within memory,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), vol. 67, 2024, pp. 286–288

work page 2024

[5] [5]

Conv-sram: An energy-efficient sram with in- memory dot-product computation for low-power convolutional neural networks,

A. Biswas and A. P. Chandrakasan, “Conv-sram: An energy-efficient sram with in- memory dot-product computation for low-power convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 217–230, 2019

work page 2019

[6] [6]

Siddhartha Raman, H

P. K. R. Boppidi, S. S. Raman, H. Renuka, and S. Kundu, “Pt/Cu:ZnO/Nb:STO memristive dual port for cache memory applications,”AIP Conference Proceedings, vol. 2265, no. 1, p. 030212, 11 2020. [Online]. Available: https://doi.org/10.1063/5.0016597

work page doi:10.1063/5.0016597 2020

[7] [7]

Neuromorphic computing with pcm-based crossbar arrays,

I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y . Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic computing with pcm-based crossbar arrays,”Nature Communications, vol. 9, no. 1, p. 2514, 2018

work page 2018

[8] [8]

Drampower: Open-source dram power & energy estimation tool,

K. Chandrasekar, C. Weis, Y . Li, B. Akesson, O. Naji, M. Jung, N. Wehn, and K. Goossens, “Drampower: Open-source dram power & energy estimation tool,” inProceedings of the 2012 IEEE International Conference on High Performance Computing and Simulation (HPCS), 2012, pp. 64–69

work page 2012

[9] [9]

Understanding reduced-voltage operation in modern dram devices,

K. K.-W. Chang and et al., “Understanding reduced-voltage operation in modern dram devices,” inSIGMETRICS, 2017

work page 2017

[10] [10]

Prime: A novel processing-in-memory architecture for neural network computation in rram-based main memory,

P. Chi, S. Li, C. Xu, T. Zhang, J. Gu, W. Jiang, X. Zhang, and Y . Xie, “Prime: A novel processing-in-memory architecture for neural network computation in rram-based main memory,” inProceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 27–39

work page 2016

[11] [11]

A survey of processing-in-memory: From fundamentals to real- world case studies,

F. Gaoet al., “A survey of processing-in-memory: From fundamentals to real- world case studies,”arXiv preprint arXiv:2105.03814, 2021

work page arXiv 2021

[12] [12]

Computedram: In-memory compute using off-the-shelf drams,

F. Gao, G. Tziantzioulis, and D. Wentzlaff, “Computedram: In-memory compute using off-the-shelf drams,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52. New York, NY , USA: Association for Computing Machinery, 2019, p. 100–113. [Online]. Available: https://doi.org/10.1145/3352460.3358260

work page doi:10.1145/3352460.3358260 2019

[13] [13]

Tetris: Scalable and efficient neural network acceleration with 3d memory,

M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “Tetris: Scalable and efficient neural network acceleration with 3d memory,” inProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017, pp. 751–764

work page 2017

[14] [14]

What your dram power models are not telling you: Lessons from a detailed experimental study,

S. Ghose and et al., “What your dram power models are not telling you: Lessons from a detailed experimental study,”Proc. ACM Meas. Anal. Comput. Syst., vol. 2, no. 3, p. 24, 2018

work page 2018

[15] [15]

What your dram power models are not telling you: Lessons from a detailed experimental study,

S. Ghose, A. G. Ya ˘glıkc ¸ı, R. Guptaet al., “What your dram power models are not telling you: Lessons from a detailed experimental study,”Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), vol. 2, no. 3, pp. 1–28, 2018

work page 2018

[16] [16]

Newton: A DRAM-maker’s accelerator-in-memory (AiM) architec- ture for machine learning,

M. Heet al., “Newton: A DRAM-maker’s accelerator-in-memory (AiM) architec- ture for machine learning,” in53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 834–847

work page 2020

[17] [17]

Ddr4 sdram standard (jesd79-4),

JEDEC Solid State Technology Association, “Ddr4 sdram standard (jesd79-4),” JEDEC, Tech. Rep., 2012. [Online]. Available: https://jedec.org

work page 2012

[18] [18]

Neuromorphic computing with nanoscale resistive switching memory devices,

A. Jeyasothy and et al., “Neuromorphic computing with nanoscale resistive switching memory devices,”Nature Electronics, vol. 4, pp. 81–90, 2021

work page 2021

[19] [19]

Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,

D.-H. Kim, J. Kung, S.-H. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,” inProceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 380–392

work page 2016

[20] [20]

A case for exploiting subarray- level parallelism (salp) in dram,

Y . Kim, V . Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray- level parallelism (salp) in dram,” inISCA, 2012

work page 2012

[21] [21]

Unconventional computing using ising accelerators,

J. P. Kulkarni, S. R. Sundara Raman, S. Xie, and C.-P. Lo, “Unconventional computing using ising accelerators,”Computer, vol. 58, no. 6, pp. 83–86, 2025

work page 2025

[22] [22]

Hardware architecture and software stack for pim based on commercial dram technology: Industrial product,

S. Lee, S.-h. Kang, J. Lee, H. Kim, E. Lee, S.-y. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, S. O, A. Iyer, D. Wang, K. Sohn, and N. S. Kim, “Hardware architecture and software stack for pim based on commercial dram technology: Industrial product,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 43–56

work page 2021

[23] [23]

Modeling and compensation of ir drop in crosspoint accelerators of neural networks,

N. Lepri, M. Baldo, P. Mannocci, A. Glukhov, V . Milo, and D. Ielmini, “Modeling and compensation of ir drop in crosspoint accelerators of neural networks,”IEEE Transactions on Electron Devices, vol. 69, no. 3, pp. 1575–1581, 2022

work page 2022

[24] [24]

Analogue signal and image processing with large-scale rram crossbars,

C. Li, D. Belkin, Y . Li, P. Yan, M. Hu, N. Ge, H. Sheng, H. Chang, C. Pao, J. M. Linet al., “Analogue signal and image processing with large-scale rram crossbars,”Nature Electronics, vol. 1, no. 1, pp. 52–59, 2018

work page 2018

[25] [25]

Drisa: A dram-based reconfigurable in-situ accelerator,

S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y . Xie, “Drisa: A dram-based reconfigurable in-situ accelerator,” in2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 288–301

work page 2017

[26] [26]

Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems,

O. Mutlu and T. Moscibroda, “Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems,” inProceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA ’08. IEEE Computer Society, 2008, pp. 63–74

work page 2008

[27] [27]

Phase transition material-assisted low-power sram design,

S. S. T. Nibhanupudi, S. R. S. Raman, and J. P. Kulkarni, “Phase transition material-assisted low-power sram design,”IEEE Transactions on Electron De- vices, vol. 68, no. 5, pp. 2281–2288, 2021

work page 2021

[28] [28]

Ultra-low-voltage utbb-soi-based, pseudo-static storage circuits for cryogenic cmos applications,

S. S. T. Nibhanupudi, S. R. Sundara Raman, M. Cass ´e, L. Hutin, and J. P. Kulkarni, “Ultra-low-voltage utbb-soi-based, pseudo-static storage circuits for cryogenic cmos applications,”IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 7, no. 2, pp. 201–208, 2021

work page 2021

[29] [29]

Thermal-aware scheduling for 3d-stacked memory systems,

A. G. Pavlidis and S. Memik, “Thermal-aware scheduling for 3d-stacked memory systems,”IEEE Transactions on Computers, 2023

work page 2023

[30] [30]

A review on non-volatile and volatile emerging memory technologies,

S. R. S. Raman, “A review on non-volatile and volatile emerging memory technologies,” inComputer Memory and Data Storage, A. Seyedi, Ed. Rijeka: IntechOpen, 2024, ch. 3. [Online]. Available: https://doi.org/10.5772/intechopen.110617

work page doi:10.5772/intechopen.110617 2024

[31] [31]

Spark: Sparsity aware, low area, energy-efficient, near-memory architecture for accelerating linear programming problems,

S. R. S. Raman, L. John, and J. P. Kulkarni, “Spark: Sparsity aware, low area, energy-efficient, near-memory architecture for accelerating linear programming problems,” in2025 IEEE International Symposium on High Performance Com- puter Architecture (HPCA), 2025, pp. 99–112

work page 2025

[32] [32]

S. R. S. Raman and J. P. Kulkarni, “Abi: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache gpu architecture with light-weight softmax for deep learning, linear algebra, and ising compute,”

work page

[33] [33]

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

[Online]. Available: https://arxiv.org/abs/2602.14262

work page internal anchor Pith review Pith/arXiv arXiv

[34] [34]

Threshold selector and capacitive coupled assist techniques for write voltage reduction in metal–ferroelectric–metal field-effect transistor,

S. R. S. Raman, S. S. T. Nibhanupudi, A. K. Saha, S. Gupta, and J. P. Kulkarni, “Threshold selector and capacitive coupled assist techniques for write voltage reduction in metal–ferroelectric–metal field-effect transistor,”IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6132–6138, 2021

work page 2021

[35] [35]

High noise margin, digital logic design using josephson junction field-effect transistors for cryogenic computing,

S. R. S. Raman, F. Wen, R. Pillarisetty, V . De, and J. P. Kulkarni, “High noise margin, digital logic design using josephson junction field-effect transistors for cryogenic computing,”IEEE Transactions on Applied Superconductivity, vol. 31, no. 5, pp. 1–5, 2021

work page 2021

[36] [36]

Compute-in-edram with backend integrated indium gallium zinc oxide transistors,

S. R. S. Raman, S. Xie, and J. P.Kulkarni, “Compute-in-edram with backend integrated indium gallium zinc oxide transistors,” in2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5

work page 2021

[37] [37]

Computing in memory with fefets,

D. Reis, M. Niemier, and X. S. Hu, “Computing in memory with fefets,” inProc. Int. Symp. Low Power Electron. Design, 2018, pp. 1–6

work page 2018

[38] [38]

Memory access scheduling,

S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory access scheduling,” inProceedings of the 27th Annual International Symposium on Computer Architecture, ser. ISCA ’00. ACM, 2000, pp. 128–138

work page 2000

[39] [39]

Computational phase-change memory: beyond von neumann computing,

A. Sebastian, M. Le Gallo, G. W. Burr, P. Narayan, I. Boybat, M. L. Gallo, S. R. Nandakumar, T. Tuma, and E. Eleftheriou, “Computational phase-change memory: beyond von neumann computing,”Journal of Applied Physics, vol. 126, no. 15, p. 151101, 2019

work page 2019

[40] [40]

Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,

V . Seshadri, Y . Kim, C. Fallin, D. Lee, R. Radojkovic, G. Boggs, T. Mudge, D. Burger, T. C. Mowry, and O. Mutlu, “Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,” inProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 185–197

work page 2013

[41] [41]

Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,

V . Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,” inProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, Cambridge, MA, USA, October 14-18, 2017...

work page 2017

[42] [42]

Understanding and improving dram performance,

V . S. Seshadri, “Understanding and improving dram performance,” Ph.D. disser- tation, Carnegie Mellon University, 2015

work page 2015

[43] [43]

30.1 a 40nm vliw edge accelerator with 5mb of 0.256 pj/b rram and a localization solver for bristle robot surveillance,

S. D. e. a. Spetalnick, “30.1 a 40nm vliw edge accelerator with 5mb of 0.256 pj/b rram and a localization solver for bristle robot surveillance,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), 2024, pp. 1–3

work page 2024

[44] [44]

Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks,

S. R. Sundara Raman, L. John, and J. P. Kulkarni, “Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks,”ACM Trans. Archit. Code Optim., vol. 21, no. 2, May

work page

[45] [45]

Kulkarni

[Online]. Available: https://doi.org/10.1145/3652607

work page doi:10.1145/3652607

[46] [46]

Sachi: A stationarity- aware, all-digital, near-memory, ising architecture,

S. R. Sundara Raman, L. K. John, and J. P. Kulkarni, “Sachi: A stationarity- aware, all-digital, near-memory, ising architecture,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024, pp. 719– 731

work page 2024

[47] [47]

Enabling in- memory computations in non-volatile sram designs,

S. R. Sundara Raman, S. S. T. Nibhanupudi, and J. P. Kulkarni, “Enabling in- memory computations in non-volatile sram designs,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 12, no. 2, pp. 557–568, 2022

work page 2022

[48] [48]

Igzo cim: Enabling in- memory computations using multilevel capacitorless indium–gallium–zinc–oxide- based embedded dram technology,

S. R. Sundara Raman, S. Xie, and J. P. Kulkarni, “Igzo cim: Enabling in- memory computations using multilevel capacitorless indium–gallium–zinc–oxide- based embedded dram technology,”IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 8, no. 1, pp. 35–43, 2022

work page 2022

[49] [49]

Understanding the energy consumption of dynamic random access memories,

D. V ogelsang, “Understanding the energy consumption of dynamic random access memories,”IEEE Micro, vol. 30, no. 1, pp. 26–34, 2010

work page 2010

[50] [50]

A compute-in-memory chip based on resistive random-access memory,

W. Wan, R. Kubendran, C. Schaefer, S. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. Wong, and G. Cauwenberghs, “A compute-in-memory chip based on resistive random-access memory,”Nature, vol. 608, pp. 504–512, 08 2022

work page 2022

[51] [51]

Ising-cim: A reconfigurable and scalable compute within memory analog ising accelerator for solving combinatorial optimization problems,

S. Xie, S. R. S. Raman, C. Ni, M. Wang, M. Yang, and J. P. Kulkarni, “Ising-cim: A reconfigurable and scalable compute within memory analog ising accelerator for solving combinatorial optimization problems,”IEEE Journal of Solid-State Circuits, pp. 1–13, 2022

work page 2022