A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks

Jaydeep P. Kulkarni; Lizy John; Siddhartha Raman Sundara Raman

arxiv: 2605.19405 · v3 · pith:J57STKHOnew · submitted 2026-05-19 · 💻 cs.AR

A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks

Siddhartha Raman Sundara Raman , Lizy John , Jaydeep P. Kulkarni This is my paper

Pith reviewed 2026-06-30 17:53 UTC · model grok-4.3

classification 💻 cs.AR

keywords graph neural networksprocessing-in-memorynear-memory acceleratorsparsity-aware architecturereconfigurable computingGNN accelerationenergy-efficient hardware

0 comments

The pith

NEM-GNN uses a DAC/ADC-less near-memory design to accelerate graph neural networks with early termination and sparsity-aware aggregation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NEM-GNN as a processing-in-memory architecture tailored for the combination and aggregation stages of graph neural networks. Conventional processors incur high energy costs from moving irregular sparse graph data, while prior accelerators face limits from analog components and fixed structures. NEM-GNN avoids DACs and ADCs entirely, adds early compute termination, reconfigurable pre-computation, and a compute-as-soon-as-ready broadcast model to perform aggregation near memory. A sympathetic reader would see this as a route to running larger GNN workloads on specialized hardware with far lower power draw.

Core claim

NEM-GNN is a scalable processing-in-memory architecture that eliminates DACs and ADCs, applies early compute termination and reconfigurable system-on-chip pre-computation, and uses a compute-as-soon-as-ready broadcast model for graph- and sparsity-aware near-memory aggregation, yielding 80-230x higher performance, 80-300x higher throughput, 850-1134x better energy efficiency, and 7-8x higher compute density than prior state-of-the-art approaches.

What carries the argument

The compute-as-soon-as-ready (CAR) and broadcast-based execution model for graph- and sparsity-aware near-memory aggregation.

If this is right

GNN workloads can execute with orders-of-magnitude lower energy than on CPUs, GPUs, or prior accelerators.
The absence of DAC/ADC circuits allows higher compute density and better scaling to larger graphs.
Early termination and CAR execution reduce wasted work on irregular sparse data.
Reconfigurable pre-computation supports both dense convolution and sparse aggregation in one fabric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reconfigurable elements may support adaptation to new GNN variants without full hardware respins.
The near-memory approach could extend to other sparse workloads such as recommendation systems or scientific simulations.
Integration of the CAR model with existing memory hierarchies might reduce the need for custom silicon in future chips.

Load-bearing premise

The reported performance and energy numbers reflect complete comparisons that include all reconfiguration, control, and data-movement overheads.

What would settle it

A side-by-side measurement of NEM-GNN against a prior accelerator on the same platform that accounts for every overhead and yields speedups below 10x would disprove the performance claims.

Figures

Figures reproduced from arXiv: 2605.19405 by Jaydeep P. Kulkarni, Lizy John, Siddhartha Raman Sundara Raman.

**Figure 1.** Figure 1: Undirected, unweighted graph with 5 nodes and 6 edges passing through 1-layer GCN. Combination showing MAC between dense feature and weight matrices, aggregation showing MAC between sparse D-1, adjacency matrices to generate final MAC before ReLU, softmax function Attention Networks (GAT), and GraphSage [32], [33], are being extensively researched. These explorations are geared towards unraveling specific … view at source ↗

**Figure 2.** Figure 2: Landscape of Graph neural network based acceleration. The prior works are predominantly dedicated accelerators requiring periodic host-accelerator interaction. These are further classified into Von-Neumann, ReRAM based PIM, DRAM/HBM based PIM. The proposed accelerator is not dedicated and reuses cache in CPUs to perform GCNs. The bitcells for PIM designs are also shown the BL to half of the operating volta… view at source ↗

**Figure 3.** Figure 3: a) ReRAM approaches (i) use DAC for incoming H conversion to an equivalent analog value (ii) store weights of GNN in binary scaled fashion (iii) utilize current buffer+reductor to perform current-based summation and ADC to generate H*W b) Qualitative comparison between ReRAM approaches and NEM-GNN c) A summary of the identified issues and the proposed solutions execution between combination and aggregation… view at source ↗

**Figure 4.** Figure 4: a) NEM-GNN is realized by repurposing the L1 cache for in-memory compute, with minimal near-memory peripheral logic added to each CPU core. b) In an L1 cache, consisting of 2 banks, shift and add are present at a granularity of 1 per every 8 columns per bank, with 1 adder reduction/multiplier per bank, and other dedicated logic shared across the entire cache. c) DRAM is accessed to transfer weights/ featu… view at source ↗

**Figure 5.** Figure 5: a) Compute array organization for NEM-C1: 2 tiles with 4 banks in each tile, with bit-serial PIM performed between H mapped onto RWL and W replicated across both tiles is shown for illustration. 2-bit 8-element H and 1-bit 8*3 weight matrix is shown with Hji n indicating nth bit of j th element for ith node. b) W is stored in 8T SRAM bitcell in L1 cache, and H is mapped onto RWL. RBL discharge is used as a… view at source ↗

**Figure 6.** Figure 6: NEM-C2: Early compute termination (ECT) occurs once one of the bit-serial H element bits is found to be 1, without data replication requirement. ECT data path checks for non-zero H bit in step 1 and writes the non-zero dot product into ECT register in step 2. In parallel, PIM datapath computes partial dot products in step 1 and subsequently stores them in the ECT register in step 2. This value is broadcast… view at source ↗

**Figure 7.** Figure 7: Incoming graphs are mapped onto different engines based on graph-connectivity (graph-aware) and read-out of adjacency matrix (stored in Compressed Sparse Row Format) to eliminate unnecessary compute (sparsity-aware). UWC engine: Aggregation of unweighted graphs by reading the adjacency matrix and NodeProc register (indicating the node being processed by combination) to fill the update index register in ste… view at source ↗

**Figure 8.** Figure 8: a) UWC engine: Aggregation for an unweighted, directed graph begins with reading the adjacency vector corresponding to Node Proc in Step 1, identifying outgoing nodes in step 2, and storing in Update Index register, using adders to aggregate the incoming combination vector onto the nodes in Update Index register in step 3. Each adjacency matrix element is of the form (i,j), where i/j represents the neighbo… view at source ↗

**Figure 9.** Figure 9: a) Weighted, directed aggregation, with adjacency matrix storing the weights of graphs and the direction in the case of directed graphs. The direction is read out in step 1 to check for outgoing nodes in step 2 and aggregation with the incoming combination vector is achieved using near-memory multipliers and adders in step 3 b) Weighted, undirected aggregation follows the same datapath as the directed one,… view at source ↗

**Figure 10.** Figure 10: D-generator and control logic: Degree matrix generator for generating D-1 using a sparsity-aware approach that (i) performs element-by-vector (instead of vector-by-vector) multiplication for every row, and (ii) reduces the number of computations/area by a factor of 2n/n. Auxiliary control for ReLU and softmax is shown in the right-most figure. undergoes immediate updates. This update involves the accumul… view at source ↗

**Figure 11.** Figure 11: Benchmarks: Datasets for GNNs, the number of nodes/edges/features in each of them, and the network used for GCN/GAT/GraphSage networks. Micro-architecture of NEM-GNN with the additional near-memory logic requiring 2% of AMD’s Zen3 CPU per-core area 6.2 Graph and sparsity-aware WC engine for Weighted graphs For weighted graphs, the adjacency matrix (A) is re-purposed to store the weight of interaction betw… view at source ↗

**Figure 12.** Figure 12: Performance comparison normalized to NEM-C3 for GCN, GAT and GraphSage. UWC engine is used for aggregation, NEM-C1, NEM-C2, and NEM-C3 are used for combination [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Throughput comparison measured in GOPS for GCN, GAT, and GraphSage. UWC engine is used for aggregation, NEM-C1, NEM-C2, and NEM-C3 are used for combination. Tesla v100, with 64 CUDA cores per streaming multiprocessor (SM) and an operating frequency of 1.5GHz, with 96KB L1 cache per SM, 6MB L2 cache and 16GB HBM2. AWB-GCN’s performance is obtained from its implementation on Intel D5005 FPGA with DRAM capac… view at source ↗

**Figure 14.** Figure 14: Energy comparison for GCN, GAT and GraphSage. UWC engine is used for aggregation, NEM-C1, NEM-C2, and NEM-C3 are used for combination [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: Energy efficiency comparison for GCN, GAT and GraphSage. UWC engine is used for aggregation, NEM-C1, NEM-C2, and NEM-C3 are used for combination. the lower power. In comparison to ReFLIP, NEM-GNN has the following advantages: (i) No powerhungry DAC/ADC requirements (ii) Lower write/read voltages for SRAM than ReRAM (iii) No additional write required to store back into the compute array post combination r… view at source ↗

**Figure 16.** Figure 16: a) Compute density comparison across PIM designs b) NEM-C2 performance variation with number of Hs c) NEM-C2 energy variation with bit resolution, average bit-position for first ’1’ d) Compute density, area for CS1, CS2 and CS3 e) Energy, efficiency for CS1, CS2, and CS3 [PITH_FULL_IMAGE:figures/full_fig_p022_16.png] view at source ↗

**Figure 17.** Figure 17: a) Performance/b) energy improvement of NEM-C3 relative to PIM-GCN, c) Speedup/energy improvement relative to Challapalle et.al, d) Speedup, e) Energy of NEM-C3 relative to PEDAL to NEM-C1 based design. The compute density is ∼7-8x that of ReFLIP, due to the elimination of bulky DACs/ADCs, no data replication, and sparsity-aware compute. Design space exploration: The performance of NEM-C2 varies roughly l… view at source ↗

**Figure 18.** Figure 18: a) Execution time/energy requirement/energy inefficiency of designs relative to NEM-C3 for a) Reddit dataset, b) Twitter dataset. UA means unavailable mainly because PIM-GCN faces challenges in hiding additional latency for performing CAM to identify neighbors in the scheduling policy, whereas it performs better for larger datasets. This results in speedups of ∼ 76x-105x, as depicted in Fig. 17a). Similar… view at source ↗

read the original abstract

Graph neural networks (GNNs) have gained significant interest for applications such as citation network analysis and drug discovery due to their ability to apply machine learning techniques on graph-structured data. GNNs typically employ a two-stage execution pipeline consisting of combination and aggregation kernels. The combination stage performs data-intensive convolution operations with relatively regular memory access patterns, whereas the aggregation stage operates on sparse graph data with highly irregular accesses. These heterogeneous memory behaviors make conventional CPU- and GPU-based execution energy inefficient due to substantial data movement overheads. Existing accelerators attempt to mitigate these challenges using specialized architectures and processing-in-memory (PIM) techniques. However, prior approaches often suffer from scalability limitations, area overheads, restricted parallelism, and energy inefficiencies associated with analog compute and dedicated accelerator structures. This paper presents NEM-GNN, a scalable DAC/ADC-less processing-in-memory architecture for graph neural network acceleration. The proposed design introduces early compute termination mechanisms, pre-computation using reconfigurable system-on-chip components, and graph- and sparsity-aware near-memory aggregation using a compute-as-soon-as-ready (CAR) and broadcast-based execution model. Experimental results demonstrate that NEM-GNN achieves approximately 80--230x higher performance, 80--300x higher throughput, 850--1134x better energy efficiency, and 7--8x higher compute density compared to prior state-of-the-art approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The headline speedups look large but rest on an abstract with zero experimental details, so the central claims can't be assessed yet.

read the letter

The main takeaway is that NEM-GNN claims 80-230x performance and 850-1134x energy gains over prior work, yet the abstract supplies no benchmarks, baselines, or description of how reconfiguration, control, and data-movement costs were modeled. That gap makes the numbers impossible to evaluate from what is shown.

The paper does lay out a digital, DAC/ADC-less PIM design that adds early termination, reconfigurable SoC pre-computation, and a CAR broadcast model aimed at the irregular accesses in GNN aggregation. Those mechanisms directly target the combination-versus-aggregation split that makes standard CPUs and GPUs inefficient on graph data. The choice to stay fully digital and avoid analog compute overheads is a clear departure from some earlier PIM accelerators.

The soft spot is the evaluation. Without seeing the methodology section it is impossible to know whether the reported gains include the full cost of the new features or whether they rely on optimistic assumptions about data movement and control logic. If the full paper uses cycle-accurate models that properly charge those costs, the work strengthens; if not, the comparisons are not apples-to-apples. The citation pattern follows standard PIM literature, which is fine, but the differences need to be shown in the experiments.

This paper is aimed at hardware designers working on near-memory accelerators for sparse and graph workloads in scientific computing or drug discovery. A reader already thinking about PIM fabrics for irregular ML would find the architectural ideas worth examining even if the numbers require verification. It deserves a serious referee because the topic is relevant and the claims are concrete enough that reviewers can ask for the missing experimental details and check whether the overheads are fully accounted for.

I would send it to peer review rather than desk reject so the experimental setup can be scrutinized.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes NEM-GNN, a fully reconfigurable, digital, scalable, graph- and sparsity-aware near-memory accelerator for GNNs. It introduces early compute termination mechanisms, pre-computation using reconfigurable SoC components, and graph/sparsity-aware near-memory aggregation via a compute-as-soon-as-ready (CAR) and broadcast-based execution model in a DAC/ADC-less PIM fabric. The central claim, supported by experimental results, is that NEM-GNN delivers approximately 80--230x higher performance, 80--300x higher throughput, 850--1134x better energy efficiency, and 7--8x higher compute density versus prior state-of-the-art approaches.

Significance. If the reported speedups and efficiency gains are shown to incorporate the full costs of the proposed mechanisms, the work would constitute a meaningful contribution to digital PIM architectures for irregular workloads, offering a scalable alternative to analog and fixed-structure accelerators while addressing data-movement bottlenecks in GNN combination and aggregation stages.

major comments (2)

[Experimental Results] Experimental Results section: the headline quantitative claims (80--230x performance, 850--1134x energy efficiency) rest on the assumption that all reconfiguration, control, and data-movement overheads of the CAR execution model, reconfigurable SoC pre-computation, and broadcast mechanisms are included in the cycle-accurate or analytical models; the section provides no explicit validation or breakdown demonstrating this, which directly undermines the apples-to-apples comparison with baselines.
[§4] §4 (or equivalent evaluation subsection): no description is given of the benchmark selection, input graph characteristics, error bars, or the precise modeling of PIM fabric control logic costs; without these, the 7--8x compute-density claim cannot be assessed as load-bearing evidence.

minor comments (2)

[Abstract] Abstract: the performance numbers are stated without any reference to methodology, benchmarks, or validation approach, reducing immediate clarity.
[Introduction / Architecture] Notation for CAR and broadcast model: the description in the introduction and architecture sections would benefit from a small diagram or pseudocode to clarify the timing of 'as-soon-as-ready' decisions relative to sparsity patterns.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's detailed review and constructive feedback on our manuscript. We address the major comments below and will revise the paper to incorporate additional details and clarifications as requested.

read point-by-point responses

Referee: [Experimental Results] Experimental Results section: the headline quantitative claims (80--230x performance, 850--1134x energy efficiency) rest on the assumption that all reconfiguration, control, and data-movement overheads of the CAR execution model, reconfigurable SoC pre-computation, and broadcast mechanisms are included in the cycle-accurate or analytical models; the section provides no explicit validation or breakdown demonstrating this, which directly undermines the apples-to-apples comparison with baselines.

Authors: We thank the referee for highlighting this important point. Our cycle-accurate simulator and analytical models do account for the reconfiguration, control, and data-movement overheads associated with the CAR execution model, reconfigurable SoC pre-computation, and broadcast mechanisms. These costs are modeled based on the hardware implementation details provided in Sections 3 and 4. However, to make this explicit and strengthen the comparison, we will add a dedicated subsection in the Experimental Results section that provides a breakdown of these overheads and validation against the baseline models. This will ensure transparency in the apples-to-apples comparison. revision: yes
Referee: [§4] §4 (or equivalent evaluation subsection): no description is given of the benchmark selection, input graph characteristics, error bars, or the precise modeling of PIM fabric control logic costs; without these, the 7--8x compute-density claim cannot be assessed as load-bearing evidence.

Authors: We agree that additional details on the evaluation methodology are necessary for full assessment of the results. In the revised manuscript, we will expand Section 4 (or the evaluation subsection) to include: (1) a detailed description of the benchmark selection criteria and the characteristics of the input graphs used (e.g., number of nodes, edges, sparsity levels), (2) error bars or statistical measures from multiple runs where applicable, and (3) a precise description of how the PIM fabric control logic costs are modeled in our simulations, including area and energy estimates. This will provide the necessary context to evaluate the 7--8x compute-density claim. revision: yes

Circularity Check

0 steps flagged

No circularity: architecture paper with experimental claims, no derivation chain or fitted parameters

full rationale

The paper describes a hardware architecture (NEM-GNN) and reports empirical speedups/energy numbers from experiments against prior SOTA. No equations, fitted parameters, self-definitional steps, or load-bearing self-citations appear in the abstract or description. Claims rest on external benchmark comparisons rather than reducing to inputs by construction. This matches the default expectation of no significant circularity for non-mathematical papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5798 in / 1167 out tokens · 28132 ms · 2026-06-30T17:53:16.159718+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture
cs.AR 2026-05 unverdicted novelty 6.0

SPARK is a sparsity-aware near-cache ILP accelerator achieving up to 15x performance and 152x energy reduction over CPUs for sparse ILPs on MIPLIB 2017 workloads.
A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture
cs.AR 2026-05 unverdicted novelty 5.0

SPARK is a sparsity-aware near-cache ILP accelerator that reuses L1 cache structures to deliver up to 15x speedup and 152x energy reduction versus CPUs on sparse MIPLIB workloads with 1.4% area overhead.
A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine
cs.AR 2026-05 unverdicted novelty 4.0

SACHI reuses CPU L1 cache for all-digital Ising acceleration and reports 300x performance and 80x energy gains over BRIM on asset allocation, molecular dynamics, image segmentation, and TSP.
Emerging memory technologies at room/cryogenic temperature
cs.AR 2026-05 unverdicted novelty 1.0

Overview chapter surveying volatile and non-volatile memories including SRAM, DRAM, RRAM, MRAM, FeFET and cryogenic JJFET devices, with focus on principles, tradeoffs, and challenges.

Reference graph

Works this paper leans on

40 extracted references · 27 canonical work pages · cited by 3 Pith papers · 7 internal anchors

[1]

Siddhartha Raman, H

Pavan Kumar Reddy Boppidi, S. Siddhartha Raman, H. Renuka, and Souvik Kundu. 2020. Pt/Cu:ZnO/Nb:STO memristive dual port for cache memory applications.AIP Conference Proceedings2265, 1 (11 2020), 030212. https://doi.org/10. 1063/5.0016597 arXiv:https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0016597/14105127/030212_1_online.pdf

work page doi:10.1063/5.0016597/14105127/030212_1_online.pdf 2020
[2]

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data.IEEE Signal Processing Magazine34, 4 (2017), 18–42

2017
[3]

Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, and Vijaykrishnan Narayanan. 2021. Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks. In2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9. https://doi.org/10.1109/ICCAD51958.2021.9643465

work page doi:10.1109/iccad51958.2021.9643465 2021
[4]

Jiaxian Chen, Yiquan Lin, Kaoyi Sun, Jiexin Chen, Chenlin Ma, Rui Mao, and Yi Wang. 2022. GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems41, 11 (2022), 3579–3590. https://doi.org/10.1109/TCAD.2022.3198320

work page doi:10.1109/tcad.2022.3198320 2022
[5]

Yuhan Chen, Alireza Khadem, Xin He, Nishil Talati, Tanvir Ahmed Khan, and Trevor Mudge. 2023. PEDAL: A Power Efficient GCN Accelerator with Multiple DAtafLows. In2023 Design, Automation & Test in Europe Conference Exhibition (DATE). 1–6. https://doi.org/10.23919/DATE56975.2023.10137240

work page doi:10.23919/date56975.2023.10137240 2023
[6]

Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric.arXiv preprint arXiv:1903.02428(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[7]

Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 922–936

2020
[8]

Yu Huang, Long Zheng, Pengcheng Yao, Qinggang Wang, Xiaofei Liao, Hai Jin, and Jingling Xue. 2022. Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory Architectures. In2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 1029–1042

2022
[9]

Chanwoo Jeong, Sion Jang, Eunjeong Park, and Sungchul Choi. 2020. A context-aware citation recommendation model with BERT and graph convolutional networks.Scientometrics124, 3 (2020), 1907–1922

2020
[10]

Kulkarni, Siddhartha Raman Sundara Raman, Shanshan Xie, and Chieh-Pu Lo

Jaydeep P. Kulkarni, Siddhartha Raman Sundara Raman, Shanshan Xie, and Chieh-Pu Lo. 2025. Unconventional Computing Using Ising Accelerators.Computer58, 6 (2025), 83–86. https://doi.org/10.1109/MC.2025.3544798

work page doi:10.1109/mc.2025.3544798 2025
[11]

Sukhan Lee, Shin-haeng Kang, Jaehoon Lee, Hyeonsu Kim, Eojin Lee, Seungwoo Seo, Hosang Yoon, Seungwon Lee, Kyounghwan Lim, Hyunsung Shin, Jinhyun Kim, O Seongil, Anand Iyer, David Wang, Kyomin Sohn, and Nam Sung Kim. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product. In2021 ACM/IEEE 48th Annual...

work page doi:10.1109/isca52012.2021.00013 2021
[12]

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2020. A modern primer on processing in memory.arXiv preprint arXiv:2012.03112(2020)

work page arXiv 2020
[13]

S. S. Teja Nibhanupudi, Siddhartha Raman Sundara Raman, and Jaydeep P. Kulkarni. 2021. Phase Transition Material- Assisted Low-Power SRAM Design.IEEE Transactions on Electron Devices68, 5 (2021), 2281–2288. https://doi.org/10. 1109/TED.2021.3067849

work page arXiv 2021
[14]

S. S. Teja Nibhanupudi, Siddhartha Raman Sundara Raman, Mikaël Cassé, Louis Hutin, and Jaydeep P. Kulkarni. 2021. Ultra-Low-Voltage UTBB-SOI-Based, Pseudo-Static Storage Circuits for Cryogenic CMOS Applications.IEEE Journal on Exploratory Solid-State Computational Devices and Circuits7, 2 (2021), 201–208. https://doi.org/10.1109/JXCDC. 2021.3130839

work page doi:10.1109/jxcdc 2021
[15]

Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-gcn: Geometric graph convolutional networks.arXiv preprint arXiv:2002.05287(2020)

work page arXiv 2020
[16]

Yikan Qiu, Yufei Ma, Wentao Zhao, Meng Wu, Le Ye, and Ru Huang. 2022. DCIM-GCN: Digital Computing-in-Memory to Efficiently Accelerate Graph Convolutional Networks. In2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9

2022
[17]

Siddhartha Raman Sundara Raman. 2024. A Review on Non-Volatile and Volatile Emerging Memory Technologies. InComputer Memory and Data Storage, Azam Seyedi (Ed.). IntechOpen, Rijeka, Chapter 3. https://doi.org/10.5772/ intechopen.110617

2024
[18]

2026.Compute in eDRAM using indium gallium zinc oxide transistors

Siddhartha Raman Sundara Raman. 2026.Compute in eDRAM using indium gallium zinc oxide transistors. Ph.D. dissertation. The University of Texas at Austin. https://repositories.lib.utexas.edu/items/4dbc7f92-c062-4cb8-b07b- ed29761b9704 Available: https://repositories.lib.utexas.edu/items/4dbc7f92-c062-4cb8-b07b-ed29761b9704

2026
[19]

Siddhartha Raman Sundara Raman. 2026. Emerging memory technologies at room/cryogenic temperature. arXiv:2605.21912 [cs.AR] https://arxiv.org/abs/2605.21912

work page internal anchor Pith review Pith/arXiv arXiv 2026
[20]

Revisiting reliability in large-scale machine learning research clusters

Siddhartha Raman Sundara Raman, Lizy John, and Jaydeep P. Kulkarni. 2025. SPARK: Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 99–112. https://doi.org/10.1109/HPCA61900.2025.00019

work page doi:10.1109/hpca61900.2025.00019 2025
[21]

A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture

Siddhartha Raman Sundara Raman, Lizy K John, and Jaydeep P. Kulkarni. 2026. A comprehensive study on ILP acceler- ation accounting for sparsity, area, energy, data movement using near-memory architecture. arXiv:2605.17158 [cs.AR] https://arxiv.org/abs/2605.17158

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine

Siddhartha Raman Sundara Raman, Lizy K. John, and Jaydeep P. Kulkarni. 2026. A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine. arXiv:2605.12959 [cs.AR] https://arxiv.org/abs/2605.12959

work page internal anchor Pith review Pith/arXiv arXiv 2026
[23]

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

Siddhartha Raman Sundara Raman and Jaydeep P. Kulkarni. 2026. ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute. arXiv:2602.14262 [cs.AR] https://arxiv.org/abs/2602.14262

work page internal anchor Pith review Pith/arXiv arXiv 2026
[24]

Siddhartha Raman Sundara Raman, Siyuan Ma, and Lizy Kurian John. 2026. A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM. arXiv:2604.04773 [cs.AR] https://arxiv.org/abs/2604. 04773

work page internal anchor Pith review Pith/arXiv arXiv 2026
[25]

Siddhartha Raman Sundara Raman, S. S. Teja Nibhanupudi, Atanu K. Saha, Sumeet Gupta, and Jaydeep P. Kulkarni. 2021. Threshold Selector and Capacitive Coupled Assist Techniques for Write Voltage Reduction in Metal–Ferroelectric–Metal Field-Effect Transistor.IEEE Transactions on Electron Devices68, 12 (2021), 6132–6138. https://doi.org/10.1109/TED. 2021.3121348

work page doi:10.1109/ted 2021
[26]

Kulkarni

Siddhartha Raman Sundara Raman, Feng Wen, Ravi Pillarisetty, Vivek De, and Jaydeep P. Kulkarni. 2021. High Noise Margin, Digital Logic Design Using Josephson Junction Field-Effect Transistors for Cryogenic Computing.IEEE Transactions on Applied Superconductivity31, 5 (2021), 1–5. https://doi.org/10.1109/TASC.2021.3054347

work page doi:10.1109/tasc.2021.3054347 2021
[27]

Siddhartha Raman Sundara Raman, Shanshan Xie, and Jaydeep P Kulkarni. 2022. IGZO CIM: Enabling In-Memory Computations Using Multilevel Capacitorless Indium–Gallium–Zinc–Oxide-Based Embedded DRAM Technology. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits8, 1 (2022), 35–43

2022
[28]

Siddhartha Raman Sundara Raman, Shanshan Xie, and Jaydeep P.Kulkarni. 2021. Compute-in-eDRAM with Backend Integrated Indium Gallium Zinc Oxide Transistors. In2021 IEEE International Symposium on Circuits and Systems (ISCAS). 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401798

work page doi:10.1109/iscas51556.2021.9401798 2021
[29]

Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, and Cong Hao. 2023. FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference. In2023 IEEE International Symposium on High- Performance Computer Architecture (HPCA). 1099–1112. https://doi.org/10.1109/HPCA56546.2023.10071015

work page doi:10.1109/hpca56546.2023.10071015 2023
[30]

James E Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W Rhett Davis, Paul D Franzon, Michael Bucher, Sunil Basavarajaiah, Julie Oh, et al. 2007. FreePDK: An open-source variation-aware design kit. In2007 IEEE international conference on Microelectronic Systems Education (MSE’07). IEEE, 173–174

2007
[31]

Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near- memory accelerator for graph neural networks,

Siddhartha Raman Sundara Raman, Lizy John, and Jaydeep P. Kulkarni. 2024. NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks.ACM Trans. Archit. Code Optim.21, 2, Article 39 (May 2024), 26 pages. https://doi.org/10.1145/3652607

work page doi:10.1145/3652607 2024
[32]

InProceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Siddhartha Raman Sundara Raman, Lizy K. John, and Jaydeep P. Kulkarni. 2024. SACHI: A Stationarity-Aware, All-Digital, Near-Memory, Ising Architecture. In2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 719–731. https://doi.org/10.1109/HPCA57654.2024.00061

work page doi:10.1109/hpca57654.2024.00061 2024
[33]

Siddhartha Raman Sundara Raman, S. S. Teja Nibhanupudi, and Jaydeep P. Kulkarni. 2022. Enabling In-Memory Computations in Non-Volatile SRAM Designs.IEEE Journal on Emerging and Selected Topics in Circuits and Systems12, 2 (2022), 557–568. https://doi.org/10.1109/JETCAS.2022.3174148

work page doi:10.1109/jetcas.2022.3174148 2022
[34]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks.arXiv preprint arXiv:1710.10903(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[35]

Hongwei Wang, Jialin Wang, Jia Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Wenjie Li, Xing Xie, and Minyi Guo. 2019. Learning graph representation with generative adversarial nets.IEEE Transactions on Knowledge and Data Engineering33, 8 (2019), 3090–3103

2019
[36]

Kulkarni

Shanshan Xie, Siddhartha Raman Sundara Raman, Can Ni, Meizhi Wang, Mengtian Yang, and Jaydeep P. Kulkarni. 2022. Ising-CIM: A Reconfigurable and Scalable Compute Within Memory Analog Ising Accelerator for Solving Combinatorial Optimization Problems.IEEE Journal of Solid-State Circuits(2022), 1–13. https://doi.org/10.1109/JSSC.2022.3176610

work page doi:10.1109/jssc.2022.3176610 2022
[37]

Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, and Yuan Xie. 2021. Spacea: Sparse matrix vector multiplication on processing-in-memory accelerator. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 570–583

2021
[38]

Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie
[39]

In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Hygcn: A gcn accelerator with hybrid architecture. In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15–29
[40]

Tao Yang, Dongyue Li, Yibo Han, Yilong Zhao, Fangxin Liu, Xiaoyao Liang, Zhezhi He, and Li Jiang. 2021. PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration. In2021 58th ACM/IEEE Design Automation Conference (DAC). 583–588. https://doi.org/10.1109/DAC18074.2021.9586231

work page doi:10.1109/dac18074.2021.9586231 2021

[1] [1]

Siddhartha Raman, H

Pavan Kumar Reddy Boppidi, S. Siddhartha Raman, H. Renuka, and Souvik Kundu. 2020. Pt/Cu:ZnO/Nb:STO memristive dual port for cache memory applications.AIP Conference Proceedings2265, 1 (11 2020), 030212. https://doi.org/10. 1063/5.0016597 arXiv:https://pubs.aip.org/aip/acp/article-pdf/doi/10.1063/5.0016597/14105127/030212_1_online.pdf

work page doi:10.1063/5.0016597/14105127/030212_1_online.pdf 2020

[2] [2]

Michael M Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, and Pierre Vandergheynst. 2017. Geometric deep learning: going beyond euclidean data.IEEE Signal Processing Magazine34, 4 (2017), 18–42

2017

[3] [3]

Nagadastagiri Challapalle, Karthik Swaminathan, Nandhini Chandramoorthy, and Vijaykrishnan Narayanan. 2021. Crossbar based Processing in Memory Accelerator Architecture for Graph Convolutional Networks. In2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9. https://doi.org/10.1109/ICCAD51958.2021.9643465

work page doi:10.1109/iccad51958.2021.9643465 2021

[4] [4]

Jiaxian Chen, Yiquan Lin, Kaoyi Sun, Jiexin Chen, Chenlin Ma, Rui Mao, and Yi Wang. 2022. GCIM: Toward Efficient Processing of Graph Convolutional Networks in 3D-Stacked Memory.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems41, 11 (2022), 3579–3590. https://doi.org/10.1109/TCAD.2022.3198320

work page doi:10.1109/tcad.2022.3198320 2022

[5] [5]

Yuhan Chen, Alireza Khadem, Xin He, Nishil Talati, Tanvir Ahmed Khan, and Trevor Mudge. 2023. PEDAL: A Power Efficient GCN Accelerator with Multiple DAtafLows. In2023 Design, Automation & Test in Europe Conference Exhibition (DATE). 1–6. https://doi.org/10.23919/DATE56975.2023.10137240

work page doi:10.23919/date56975.2023.10137240 2023

[6] [6]

Matthias Fey and Jan Eric Lenssen. 2019. Fast graph representation learning with PyTorch Geometric.arXiv preprint arXiv:1903.02428(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019

[7] [7]

Tong Geng, Ang Li, Runbin Shi, Chunshu Wu, Tianqi Wang, Yanfei Li, Pouya Haghi, Antonino Tumeo, Shuai Che, Steve Reinhardt, et al. 2020. AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 922–936

2020

[8] [8]

Yu Huang, Long Zheng, Pengcheng Yao, Qinggang Wang, Xiaofei Liao, Hai Jin, and Jingling Xue. 2022. Accelerating Graph Convolutional Networks Using Crossbar-based Processing-In-Memory Architectures. In2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 1029–1042

2022

[9] [9]

Chanwoo Jeong, Sion Jang, Eunjeong Park, and Sungchul Choi. 2020. A context-aware citation recommendation model with BERT and graph convolutional networks.Scientometrics124, 3 (2020), 1907–1922

2020

[10] [10]

Kulkarni, Siddhartha Raman Sundara Raman, Shanshan Xie, and Chieh-Pu Lo

Jaydeep P. Kulkarni, Siddhartha Raman Sundara Raman, Shanshan Xie, and Chieh-Pu Lo. 2025. Unconventional Computing Using Ising Accelerators.Computer58, 6 (2025), 83–86. https://doi.org/10.1109/MC.2025.3544798

work page doi:10.1109/mc.2025.3544798 2025

[11] [11]

Sukhan Lee, Shin-haeng Kang, Jaehoon Lee, Hyeonsu Kim, Eojin Lee, Seungwoo Seo, Hosang Yoon, Seungwon Lee, Kyounghwan Lim, Hyunsung Shin, Jinhyun Kim, O Seongil, Anand Iyer, David Wang, Kyomin Sohn, and Nam Sung Kim. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology : Industrial Product. In2021 ACM/IEEE 48th Annual...

work page doi:10.1109/isca52012.2021.00013 2021

[12] [12]

Onur Mutlu, Saugata Ghose, Juan Gómez-Luna, and Rachata Ausavarungnirun. 2020. A modern primer on processing in memory.arXiv preprint arXiv:2012.03112(2020)

work page arXiv 2020

[13] [13]

S. S. Teja Nibhanupudi, Siddhartha Raman Sundara Raman, and Jaydeep P. Kulkarni. 2021. Phase Transition Material- Assisted Low-Power SRAM Design.IEEE Transactions on Electron Devices68, 5 (2021), 2281–2288. https://doi.org/10. 1109/TED.2021.3067849

work page arXiv 2021

[14] [14]

S. S. Teja Nibhanupudi, Siddhartha Raman Sundara Raman, Mikaël Cassé, Louis Hutin, and Jaydeep P. Kulkarni. 2021. Ultra-Low-Voltage UTBB-SOI-Based, Pseudo-Static Storage Circuits for Cryogenic CMOS Applications.IEEE Journal on Exploratory Solid-State Computational Devices and Circuits7, 2 (2021), 201–208. https://doi.org/10.1109/JXCDC. 2021.3130839

work page doi:10.1109/jxcdc 2021

[15] [15]

Hongbin Pei, Bingzhe Wei, Kevin Chen-Chuan Chang, Yu Lei, and Bo Yang. 2020. Geom-gcn: Geometric graph convolutional networks.arXiv preprint arXiv:2002.05287(2020)

work page arXiv 2020

[16] [16]

Yikan Qiu, Yufei Ma, Wentao Zhao, Meng Wu, Le Ye, and Ru Huang. 2022. DCIM-GCN: Digital Computing-in-Memory to Efficiently Accelerate Graph Convolutional Networks. In2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9

2022

[17] [17]

Siddhartha Raman Sundara Raman. 2024. A Review on Non-Volatile and Volatile Emerging Memory Technologies. InComputer Memory and Data Storage, Azam Seyedi (Ed.). IntechOpen, Rijeka, Chapter 3. https://doi.org/10.5772/ intechopen.110617

2024

[18] [18]

2026.Compute in eDRAM using indium gallium zinc oxide transistors

Siddhartha Raman Sundara Raman. 2026.Compute in eDRAM using indium gallium zinc oxide transistors. Ph.D. dissertation. The University of Texas at Austin. https://repositories.lib.utexas.edu/items/4dbc7f92-c062-4cb8-b07b- ed29761b9704 Available: https://repositories.lib.utexas.edu/items/4dbc7f92-c062-4cb8-b07b-ed29761b9704

2026

[19] [19]

Siddhartha Raman Sundara Raman. 2026. Emerging memory technologies at room/cryogenic temperature. arXiv:2605.21912 [cs.AR] https://arxiv.org/abs/2605.21912

work page internal anchor Pith review Pith/arXiv arXiv 2026

[20] [20]

Revisiting reliability in large-scale machine learning research clusters

Siddhartha Raman Sundara Raman, Lizy John, and Jaydeep P. Kulkarni. 2025. SPARK: Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems. In2025 IEEE International Symposium on High Performance Computer Architecture (HPCA). 99–112. https://doi.org/10.1109/HPCA61900.2025.00019

work page doi:10.1109/hpca61900.2025.00019 2025

[21] [21]

A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture

Siddhartha Raman Sundara Raman, Lizy K John, and Jaydeep P. Kulkarni. 2026. A comprehensive study on ILP acceler- ation accounting for sparsity, area, energy, data movement using near-memory architecture. arXiv:2605.17158 [cs.AR] https://arxiv.org/abs/2605.17158

work page internal anchor Pith review Pith/arXiv arXiv 2026

[22] [22]

A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine

Siddhartha Raman Sundara Raman, Lizy K. John, and Jaydeep P. Kulkarni. 2026. A detailed algorithmic study on a reuse-aware, near memory, all-digital Ising machine. arXiv:2605.12959 [cs.AR] https://arxiv.org/abs/2605.12959

work page internal anchor Pith review Pith/arXiv arXiv 2026

[23] [23]

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

Siddhartha Raman Sundara Raman and Jaydeep P. Kulkarni. 2026. ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute. arXiv:2602.14262 [cs.AR] https://arxiv.org/abs/2602.14262

work page internal anchor Pith review Pith/arXiv arXiv 2026

[24] [24]

Siddhartha Raman Sundara Raman, Siyuan Ma, and Lizy Kurian John. 2026. A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM. arXiv:2604.04773 [cs.AR] https://arxiv.org/abs/2604. 04773

work page internal anchor Pith review Pith/arXiv arXiv 2026

[25] [25]

Siddhartha Raman Sundara Raman, S. S. Teja Nibhanupudi, Atanu K. Saha, Sumeet Gupta, and Jaydeep P. Kulkarni. 2021. Threshold Selector and Capacitive Coupled Assist Techniques for Write Voltage Reduction in Metal–Ferroelectric–Metal Field-Effect Transistor.IEEE Transactions on Electron Devices68, 12 (2021), 6132–6138. https://doi.org/10.1109/TED. 2021.3121348

work page doi:10.1109/ted 2021

[26] [26]

Kulkarni

Siddhartha Raman Sundara Raman, Feng Wen, Ravi Pillarisetty, Vivek De, and Jaydeep P. Kulkarni. 2021. High Noise Margin, Digital Logic Design Using Josephson Junction Field-Effect Transistors for Cryogenic Computing.IEEE Transactions on Applied Superconductivity31, 5 (2021), 1–5. https://doi.org/10.1109/TASC.2021.3054347

work page doi:10.1109/tasc.2021.3054347 2021

[27] [27]

Siddhartha Raman Sundara Raman, Shanshan Xie, and Jaydeep P Kulkarni. 2022. IGZO CIM: Enabling In-Memory Computations Using Multilevel Capacitorless Indium–Gallium–Zinc–Oxide-Based Embedded DRAM Technology. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits8, 1 (2022), 35–43

2022

[28] [28]

Siddhartha Raman Sundara Raman, Shanshan Xie, and Jaydeep P.Kulkarni. 2021. Compute-in-eDRAM with Backend Integrated Indium Gallium Zinc Oxide Transistors. In2021 IEEE International Symposium on Circuits and Systems (ISCAS). 1–5. https://doi.org/10.1109/ISCAS51556.2021.9401798

work page doi:10.1109/iscas51556.2021.9401798 2021

[29] [29]

Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, and Cong Hao. 2023. FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference. In2023 IEEE International Symposium on High- Performance Computer Architecture (HPCA). 1099–1112. https://doi.org/10.1109/HPCA56546.2023.10071015

work page doi:10.1109/hpca56546.2023.10071015 2023

[30] [30]

James E Stine, Ivan Castellanos, Michael Wood, Jeff Henson, Fred Love, W Rhett Davis, Paul D Franzon, Michael Bucher, Sunil Basavarajaiah, Julie Oh, et al. 2007. FreePDK: An open-source variation-aware design kit. In2007 IEEE international conference on Microelectronic Systems Education (MSE’07). IEEE, 173–174

2007

[31] [31]

Nem-gnn: Dac/adc-less, scalable, reconfigurable, graph and sparsity-aware near- memory accelerator for graph neural networks,

Siddhartha Raman Sundara Raman, Lizy John, and Jaydeep P. Kulkarni. 2024. NEM-GNN: DAC/ADC-less, Scalable, Reconfigurable, Graph and Sparsity-Aware Near-Memory Accelerator for Graph Neural Networks.ACM Trans. Archit. Code Optim.21, 2, Article 39 (May 2024), 26 pages. https://doi.org/10.1145/3652607

work page doi:10.1145/3652607 2024

[32] [32]

InProceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA)

Siddhartha Raman Sundara Raman, Lizy K. John, and Jaydeep P. Kulkarni. 2024. SACHI: A Stationarity-Aware, All-Digital, Near-Memory, Ising Architecture. In2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA). 719–731. https://doi.org/10.1109/HPCA57654.2024.00061

work page doi:10.1109/hpca57654.2024.00061 2024

[33] [33]

Siddhartha Raman Sundara Raman, S. S. Teja Nibhanupudi, and Jaydeep P. Kulkarni. 2022. Enabling In-Memory Computations in Non-Volatile SRAM Designs.IEEE Journal on Emerging and Selected Topics in Circuits and Systems12, 2 (2022), 557–568. https://doi.org/10.1109/JETCAS.2022.3174148

work page doi:10.1109/jetcas.2022.3174148 2022

[34] [34]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2017. Graph attention networks.arXiv preprint arXiv:1710.10903(2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[35] [35]

Hongwei Wang, Jialin Wang, Jia Wang, Miao Zhao, Weinan Zhang, Fuzheng Zhang, Wenjie Li, Xing Xie, and Minyi Guo. 2019. Learning graph representation with generative adversarial nets.IEEE Transactions on Knowledge and Data Engineering33, 8 (2019), 3090–3103

2019

[36] [36]

Kulkarni

Shanshan Xie, Siddhartha Raman Sundara Raman, Can Ni, Meizhi Wang, Mengtian Yang, and Jaydeep P. Kulkarni. 2022. Ising-CIM: A Reconfigurable and Scalable Compute Within Memory Analog Ising Accelerator for Solving Combinatorial Optimization Problems.IEEE Journal of Solid-State Circuits(2022), 1–13. https://doi.org/10.1109/JSSC.2022.3176610

work page doi:10.1109/jssc.2022.3176610 2022

[37] [37]

Xinfeng Xie, Zheng Liang, Peng Gu, Abanti Basak, Lei Deng, Ling Liang, Xing Hu, and Yuan Xie. 2021. Spacea: Sparse matrix vector multiplication on processing-in-memory accelerator. In2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 570–583

2021

[38] [38]

Mingyu Yan, Lei Deng, Xing Hu, Ling Liang, Yujing Feng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie

[39] [39]

In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

Hygcn: A gcn accelerator with hybrid architecture. In2020 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 15–29

[40] [40]

Tao Yang, Dongyue Li, Yibo Han, Yilong Zhao, Fangxin Liu, Xiaoyao Liang, Zhezhi He, and Li Jiang. 2021. PIMGCN: A ReRAM-Based PIM Design for Graph Convolutional Network Acceleration. In2021 58th ACM/IEEE Design Automation Conference (DAC). 583–588. https://doi.org/10.1109/DAC18074.2021.9586231

work page doi:10.1109/dac18074.2021.9586231 2021