A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM
Pith reviewed 2026-05-10 18:53 UTC · model grok-4.3
The pith
DRAM-based compute-in-memory creates non-traditional current demands that require power delivery network aware designs for reliable scaling.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By classifying PIM-induced current patterns along temporal (burst versus sustained) and spatial (localized versus distributed) dimensions, the paper shows that representative DRAM PIM mechanisms stress the power delivery network through concurrent activations and large-scale parallel execution, producing voltage droop, IR drop, and thermal hotspots. It argues that DRAM-specific mitigations drawn from architectural timing, memory controller scheduling, data placement, and bank- and vault-level power management can address these stresses, establishing that PDN-aware design is necessary for scalable and reliable DRAM-based PIM systems.
What carries the argument
A unified taxonomy that classifies PIM-induced current behavior along temporal (burst vs. sustained) and spatial (localized vs. distributed) dimensions to map techniques to their PDN stresses.
Load-bearing premise
The representative PIM techniques surveyed capture the main current-demand patterns that will appear in future DRAM-based PIM deployments.
What would settle it
A measurement or simulation of a large-scale DRAM PIM system using multi-row activation and near-bank compute that runs heavy parallel workloads without producing significant voltage droops, IR drops, or thermal hotspots.
Figures
read the original abstract
Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency. DRAM-based PIM is particularly attractive due to its high density, mature manufacturing ecosystem, and compatibility with existing systems. Recent works exploit multiple levels of the DRAM hierarchy - including subarrays, banks, and 3D-stacked organizations - to enable in-memory computation using mechanisms such as multi-row activation, row-buffer operations, and near-bank compute units. However, these approaches introduce non-traditional current demand patterns that challenge the power delivery network (PDN). This paper surveys PDN challenges in DRAM-based PIM systems and proposes a unified taxonomy that characterizes PIM-induced current behavior along temporal (burst vs. sustained) and spatial (localized vs. distributed) dimensions. Using this framework, we analyze how representative PIM techniques stress the PDN through bursty activations, multi-row concurrency, and large-scale parallel execution, leading to voltage droop, IR drop, and thermal hotspots. We further discuss DRAM-specific mitigation strategies leveraging existing architectural and circuit-level mechanisms, including timing constraints, memory controller scheduling, data placement, and bank- and vault-level power management. This survey highlights the importance of PDN-aware design for scalable and reliable DRAM-based PIM systems and outlines key future research directions.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. This survey paper examines power delivery network (PDN) challenges arising in DRAM-based compute-in/near-memory (PIM) systems. It proposes a taxonomy that classifies PIM-induced current demands along temporal (burst vs. sustained) and spatial (localized vs. distributed) dimensions. The authors apply the taxonomy to analyze representative techniques—multi-row activation, row-buffer operations, and near-bank compute—and the resulting stresses including voltage droop, IR drop, and thermal hotspots. The manuscript reviews mitigation approaches based on timing constraints, memory-controller scheduling, data placement, and bank/vault-level power management, and concludes by stressing the need for PDN-aware design in scalable DRAM-PIM systems while listing future research directions.
Significance. If the taxonomy proves robust, the survey could serve as a useful organizing lens for designers and researchers working on DRAM-PIM, encouraging earlier consideration of power-delivery constraints. Its primary contribution is synthesis of existing literature rather than new quantitative results or proofs; therefore its impact will depend on how comprehensively and accurately it maps the space of current-demand patterns.
major comments (1)
- The central claim that the proposed taxonomy enables characterization of PDN stresses for scalable DRAM-PIM systems rests on the representativeness of the three chosen mechanisms (multi-row activation, row-buffer operations, near-bank compute). No coverage metric, exhaustive enumeration of alternative organizations (e.g., subarray-level logic or different vault configurations), or argument that these techniques exhaust the relevant current-signature space is supplied. This assumption is load-bearing for the taxonomy's claimed utility and for the derived mitigation recommendations.
minor comments (2)
- The abstract and introduction would benefit from an explicit statement of how many PIM techniques are analyzed and which sections contain the detailed mapping onto the taxonomy quadrants.
- Several mitigation strategies are described qualitatively; adding even brief pointers to quantitative results or simulation data from the cited works would improve clarity without altering the survey nature of the manuscript.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our survey. We address the single major comment below, acknowledging the gap in explicit coverage discussion while strengthening the manuscript's presentation of the taxonomy's scope.
read point-by-point responses
-
Referee: The central claim that the proposed taxonomy enables characterization of PDN stresses for scalable DRAM-PIM systems rests on the representativeness of the three chosen mechanisms (multi-row activation, row-buffer operations, near-bank compute). No coverage metric, exhaustive enumeration of alternative organizations (e.g., subarray-level logic or different vault configurations), or argument that these techniques exhaust the relevant current-signature space is supplied. This assumption is load-bearing for the taxonomy's claimed utility and for the derived mitigation recommendations.
Authors: We agree that the manuscript does not supply an exhaustive enumeration, coverage metric, or formal argument that the three mechanisms exhaust the current-signature space. These techniques were selected because they are prominent in the surveyed literature and map distinctly onto the taxonomy axes (multi-row activation as burst-localized, row-buffer operations as sustained-distributed, and near-bank compute as high-parallelism). In the revised version we will add a new subsection following the taxonomy definition that (1) states the selection rationale, (2) explicitly lists example alternative organizations such as subarray-level logic and varied HBM vault configurations, and (3) clarifies that the taxonomy is offered as a general organizing lens rather than a complete enumeration. This makes the scope and limitations transparent while preserving the survey's synthesis contribution. revision: yes
Circularity Check
No circularity: survey taxonomy and analysis are self-contained organizational contributions
full rationale
This paper is a survey that proposes a taxonomy along temporal and spatial dimensions to characterize current-demand patterns in DRAM-based PIM and then applies it to representative techniques drawn from prior literature. No equations, fitted parameters, predictions, or first-principles derivations appear in the provided text. The central claim that PDN-aware design is important rests on qualitative analysis of existing mechanisms rather than any reduction to quantities defined by the paper's own inputs or self-citations. The taxonomy is presented as a new organizational lens, not as a result forced by or equivalent to the surveyed data. External citations are to independent prior works and do not form a load-bearing self-citation chain.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
subarray-level PIM... bank-level PIM... 3D level PIM... mitigation strategies leveraging existing architectural and circuit-level mechanisms
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks
NEM-GNN is a scalable DAC/ADC-less processing-in-memory architecture for GNNs that uses early compute termination, reconfigurable SoC pre-computation, and compute-as-soon-as-ready broadcast execution to deliver large ...
Reference graph
Works this paper leans on
-
[1]
A scalable processing-in-memory accelerator for parallel graph processing,
J. Ahn, S. Yoo, O. Mutlu, and K. Choi, “A scalable processing-in-memory accelerator for parallel graph processing,” inProceedings of the 42nd International Symposium on Computer Architecture (ISCA), 2015, pp. 105–117
work page 2015
-
[2]
Subarray-aware scheduling for pim systems,
Anonymous, “Subarray-aware scheduling for pim systems,” 2024
work page 2024
-
[3]
Toward energy-efficient stt-mram-based near memory computing design for embedded systems,
K. Asifuzzamanet al., “Toward energy-efficient stt-mram-based near memory computing design for embedded systems,”ACM Journal on Emerging Technolo- gies in Computing Systems, 2026
work page 2026
-
[4]
J. Bae, C. Shim, and B. Kim, “15.6 e-chimera: A scalable sram-based ising macro with enhanced-chimera topology for solving combinatorial optimization problems within memory,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), vol. 67, 2024, pp. 286–288
work page 2024
-
[5]
A. Biswas and A. P. Chandrakasan, “Conv-sram: An energy-efficient sram with in- memory dot-product computation for low-power convolutional neural networks,” IEEE Journal of Solid-State Circuits, vol. 54, no. 1, pp. 217–230, 2019
work page 2019
-
[6]
P. K. R. Boppidi, S. S. Raman, H. Renuka, and S. Kundu, “Pt/Cu:ZnO/Nb:STO memristive dual port for cache memory applications,”AIP Conference Proceedings, vol. 2265, no. 1, p. 030212, 11 2020. [Online]. Available: https://doi.org/10.1063/5.0016597
-
[7]
Neuromorphic computing with pcm-based crossbar arrays,
I. Boybat, M. Le Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y . Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic computing with pcm-based crossbar arrays,”Nature Communications, vol. 9, no. 1, p. 2514, 2018
work page 2018
-
[8]
Drampower: Open-source dram power & energy estimation tool,
K. Chandrasekar, C. Weis, Y . Li, B. Akesson, O. Naji, M. Jung, N. Wehn, and K. Goossens, “Drampower: Open-source dram power & energy estimation tool,” inProceedings of the 2012 IEEE International Conference on High Performance Computing and Simulation (HPCS), 2012, pp. 64–69
work page 2012
-
[9]
Understanding reduced-voltage operation in modern dram devices,
K. K.-W. Chang and et al., “Understanding reduced-voltage operation in modern dram devices,” inSIGMETRICS, 2017
work page 2017
-
[10]
P. Chi, S. Li, C. Xu, T. Zhang, J. Gu, W. Jiang, X. Zhang, and Y . Xie, “Prime: A novel processing-in-memory architecture for neural network computation in rram-based main memory,” inProceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 27–39
work page 2016
-
[11]
A survey of processing-in-memory: From fundamentals to real- world case studies,
F. Gaoet al., “A survey of processing-in-memory: From fundamentals to real- world case studies,”arXiv preprint arXiv:2105.03814, 2021
-
[12]
Computedram: In-memory compute using off-the-shelf drams,
F. Gao, G. Tziantzioulis, and D. Wentzlaff, “Computedram: In-memory compute using off-the-shelf drams,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52. New York, NY , USA: Association for Computing Machinery, 2019, p. 100–113. [Online]. Available: https://doi.org/10.1145/3352460.3358260
-
[13]
Tetris: Scalable and efficient neural network acceleration with 3d memory,
M. Gao, J. Pu, X. Yang, M. Horowitz, and C. Kozyrakis, “Tetris: Scalable and efficient neural network acceleration with 3d memory,” inProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2017, pp. 751–764
work page 2017
-
[14]
What your dram power models are not telling you: Lessons from a detailed experimental study,
S. Ghose and et al., “What your dram power models are not telling you: Lessons from a detailed experimental study,”Proc. ACM Meas. Anal. Comput. Syst., vol. 2, no. 3, p. 24, 2018
work page 2018
-
[15]
What your dram power models are not telling you: Lessons from a detailed experimental study,
S. Ghose, A. G. Ya ˘glıkc ¸ı, R. Guptaet al., “What your dram power models are not telling you: Lessons from a detailed experimental study,”Proceedings of the ACM on Measurement and Analysis of Computing Systems (SIGMETRICS), vol. 2, no. 3, pp. 1–28, 2018
work page 2018
-
[16]
Newton: A DRAM-maker’s accelerator-in-memory (AiM) architec- ture for machine learning,
M. Heet al., “Newton: A DRAM-maker’s accelerator-in-memory (AiM) architec- ture for machine learning,” in53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 834–847
work page 2020
-
[17]
Ddr4 sdram standard (jesd79-4),
JEDEC Solid State Technology Association, “Ddr4 sdram standard (jesd79-4),” JEDEC, Tech. Rep., 2012. [Online]. Available: https://jedec.org
work page 2012
-
[18]
Neuromorphic computing with nanoscale resistive switching memory devices,
A. Jeyasothy and et al., “Neuromorphic computing with nanoscale resistive switching memory devices,”Nature Electronics, vol. 4, pp. 81–90, 2021
work page 2021
-
[19]
Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,
D.-H. Kim, J. Kung, S.-H. Chai, S. Yalamanchili, and S. Mukhopadhyay, “Neurocube: A programmable digital neuromorphic architecture with high-density 3d memory,” inProceedings of the 43rd International Symposium on Computer Architecture (ISCA), 2016, pp. 380–392
work page 2016
-
[20]
A case for exploiting subarray- level parallelism (salp) in dram,
Y . Kim, V . Seshadri, D. Lee, J. Liu, and O. Mutlu, “A case for exploiting subarray- level parallelism (salp) in dram,” inISCA, 2012
work page 2012
-
[21]
Unconventional computing using ising accelerators,
J. P. Kulkarni, S. R. Sundara Raman, S. Xie, and C.-P. Lo, “Unconventional computing using ising accelerators,”Computer, vol. 58, no. 6, pp. 83–86, 2025
work page 2025
-
[22]
S. Lee, S.-h. Kang, J. Lee, H. Kim, E. Lee, S.-y. Seo, H. Yoon, S. Lee, K. Lim, H. Shin, J. Kim, S. O, A. Iyer, D. Wang, K. Sohn, and N. S. Kim, “Hardware architecture and software stack for pim based on commercial dram technology: Industrial product,” in2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2021, pp. 43–56
work page 2021
-
[23]
Modeling and compensation of ir drop in crosspoint accelerators of neural networks,
N. Lepri, M. Baldo, P. Mannocci, A. Glukhov, V . Milo, and D. Ielmini, “Modeling and compensation of ir drop in crosspoint accelerators of neural networks,”IEEE Transactions on Electron Devices, vol. 69, no. 3, pp. 1575–1581, 2022
work page 2022
-
[24]
Analogue signal and image processing with large-scale rram crossbars,
C. Li, D. Belkin, Y . Li, P. Yan, M. Hu, N. Ge, H. Sheng, H. Chang, C. Pao, J. M. Linet al., “Analogue signal and image processing with large-scale rram crossbars,”Nature Electronics, vol. 1, no. 1, pp. 52–59, 2018
work page 2018
-
[25]
Drisa: A dram-based reconfigurable in-situ accelerator,
S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y . Xie, “Drisa: A dram-based reconfigurable in-situ accelerator,” in2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2017, pp. 288–301
work page 2017
-
[26]
Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems,
O. Mutlu and T. Moscibroda, “Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared dram systems,” inProceedings of the 35th Annual International Symposium on Computer Architecture, ser. ISCA ’08. IEEE Computer Society, 2008, pp. 63–74
work page 2008
-
[27]
Phase transition material-assisted low-power sram design,
S. S. T. Nibhanupudi, S. R. S. Raman, and J. P. Kulkarni, “Phase transition material-assisted low-power sram design,”IEEE Transactions on Electron De- vices, vol. 68, no. 5, pp. 2281–2288, 2021
work page 2021
-
[28]
Ultra-low-voltage utbb-soi-based, pseudo-static storage circuits for cryogenic cmos applications,
S. S. T. Nibhanupudi, S. R. Sundara Raman, M. Cass ´e, L. Hutin, and J. P. Kulkarni, “Ultra-low-voltage utbb-soi-based, pseudo-static storage circuits for cryogenic cmos applications,”IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 7, no. 2, pp. 201–208, 2021
work page 2021
-
[29]
Thermal-aware scheduling for 3d-stacked memory systems,
A. G. Pavlidis and S. Memik, “Thermal-aware scheduling for 3d-stacked memory systems,”IEEE Transactions on Computers, 2023
work page 2023
-
[30]
A review on non-volatile and volatile emerging memory technologies,
S. R. S. Raman, “A review on non-volatile and volatile emerging memory technologies,” inComputer Memory and Data Storage, A. Seyedi, Ed. Rijeka: IntechOpen, 2024, ch. 3. [Online]. Available: https://doi.org/10.5772/intechopen.110617
-
[31]
S. R. S. Raman, L. John, and J. P. Kulkarni, “Spark: Sparsity aware, low area, energy-efficient, near-memory architecture for accelerating linear programming problems,” in2025 IEEE International Symposium on High Performance Com- puter Architecture (HPCA), 2025, pp. 99–112
work page 2025
-
[32]
S. R. S. Raman and J. P. Kulkarni, “Abi: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache gpu architecture with light-weight softmax for deep learning, linear algebra, and ising compute,”
-
[33]
[Online]. Available: https://arxiv.org/abs/2602.14262
work page internal anchor Pith review Pith/arXiv arXiv
-
[34]
S. R. S. Raman, S. S. T. Nibhanupudi, A. K. Saha, S. Gupta, and J. P. Kulkarni, “Threshold selector and capacitive coupled assist techniques for write voltage reduction in metal–ferroelectric–metal field-effect transistor,”IEEE Transactions on Electron Devices, vol. 68, no. 12, pp. 6132–6138, 2021
work page 2021
-
[35]
S. R. S. Raman, F. Wen, R. Pillarisetty, V . De, and J. P. Kulkarni, “High noise margin, digital logic design using josephson junction field-effect transistors for cryogenic computing,”IEEE Transactions on Applied Superconductivity, vol. 31, no. 5, pp. 1–5, 2021
work page 2021
-
[36]
Compute-in-edram with backend integrated indium gallium zinc oxide transistors,
S. R. S. Raman, S. Xie, and J. P.Kulkarni, “Compute-in-edram with backend integrated indium gallium zinc oxide transistors,” in2021 IEEE International Symposium on Circuits and Systems (ISCAS), 2021, pp. 1–5
work page 2021
-
[37]
Computing in memory with fefets,
D. Reis, M. Niemier, and X. S. Hu, “Computing in memory with fefets,” inProc. Int. Symp. Low Power Electron. Design, 2018, pp. 1–6
work page 2018
-
[38]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, “Memory access scheduling,” inProceedings of the 27th Annual International Symposium on Computer Architecture, ser. ISCA ’00. ACM, 2000, pp. 128–138
work page 2000
-
[39]
Computational phase-change memory: beyond von neumann computing,
A. Sebastian, M. Le Gallo, G. W. Burr, P. Narayan, I. Boybat, M. L. Gallo, S. R. Nandakumar, T. Tuma, and E. Eleftheriou, “Computational phase-change memory: beyond von neumann computing,”Journal of Applied Physics, vol. 126, no. 15, p. 151101, 2019
work page 2019
-
[40]
Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,
V . Seshadri, Y . Kim, C. Fallin, D. Lee, R. Radojkovic, G. Boggs, T. Mudge, D. Burger, T. C. Mowry, and O. Mutlu, “Rowclone: Fast and energy-efficient in-dram bulk data copy and initialization,” inProceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2013, pp. 185–197
work page 2013
-
[41]
Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,
V . Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M. A. Kozuch, O. Mutlu, P. B. Gibbons, and T. C. Mowry, “Ambit: In-memory accelerator for bulk bitwise operations using commodity DRAM technology,” inProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-50 2017, Cambridge, MA, USA, October 14-18, 2017...
work page 2017
-
[42]
Understanding and improving dram performance,
V . S. Seshadri, “Understanding and improving dram performance,” Ph.D. disser- tation, Carnegie Mellon University, 2015
work page 2015
-
[43]
S. D. e. a. Spetalnick, “30.1 a 40nm vliw edge accelerator with 5mb of 0.256 pj/b rram and a localization solver for bristle robot surveillance,” in2024 IEEE International Solid-State Circuits Conference (ISSCC), 2024, pp. 1–3
work page 2024
-
[44]
S. R. Sundara Raman, L. John, and J. P. Kulkarni, “Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near-memory accelerator for graph neural networks,”ACM Trans. Archit. Code Optim., vol. 21, no. 2, May
- [45]
-
[46]
Sachi: A stationarity- aware, all-digital, near-memory, ising architecture,
S. R. Sundara Raman, L. K. John, and J. P. Kulkarni, “Sachi: A stationarity- aware, all-digital, near-memory, ising architecture,” in2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024, pp. 719– 731
work page 2024
-
[47]
Enabling in- memory computations in non-volatile sram designs,
S. R. Sundara Raman, S. S. T. Nibhanupudi, and J. P. Kulkarni, “Enabling in- memory computations in non-volatile sram designs,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 12, no. 2, pp. 557–568, 2022
work page 2022
-
[48]
S. R. Sundara Raman, S. Xie, and J. P. Kulkarni, “Igzo cim: Enabling in- memory computations using multilevel capacitorless indium–gallium–zinc–oxide- based embedded dram technology,”IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, vol. 8, no. 1, pp. 35–43, 2022
work page 2022
-
[49]
Understanding the energy consumption of dynamic random access memories,
D. V ogelsang, “Understanding the energy consumption of dynamic random access memories,”IEEE Micro, vol. 30, no. 1, pp. 26–34, 2010
work page 2010
-
[50]
A compute-in-memory chip based on resistive random-access memory,
W. Wan, R. Kubendran, C. Schaefer, S. Eryilmaz, W. Zhang, D. Wu, S. Deiss, P. Raina, H. Qian, B. Gao, S. Joshi, H. Wu, H.-S. Wong, and G. Cauwenberghs, “A compute-in-memory chip based on resistive random-access memory,”Nature, vol. 608, pp. 504–512, 08 2022
work page 2022
-
[51]
S. Xie, S. R. S. Raman, C. Ni, M. Wang, M. Yang, and J. P. Kulkarni, “Ising-cim: A reconfigurable and scalable compute within memory analog ising accelerator for solving combinatorial optimization problems,”IEEE Journal of Solid-State Circuits, pp. 1–13, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.