pith. sign in

arxiv: 2604.13969 · v1 · submitted 2026-04-15 · 💻 cs.AR

GEM3D CIM General Purpose Matrix Computation Using 3D Integrated SRAM eDRAM Hybrid Compute In Memory on Memory Architecture

Pith reviewed 2026-05-10 12:04 UTC · model grok-4.3

classification 💻 cs.AR
keywords compute-in-memorySRAMeDRAM3D integrationmatrix operationstransposeDNN acceleration
0
0 comments X

The pith

A 3D SRAM-eDRAM hybrid architecture runs general matrix operations inside memory at 4-bit precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a compute-in-memory design that stacks SRAM and eDRAM in three dimensions to perform not only dot products but also matrix transposes, element-wise additions, and multiplications directly within the memory array. Current CIM approaches are restricted to simple multiply-accumulate steps and therefore cannot efficiently handle the broader matrix work required by many algorithms. The design uses a transpose-based layout, in-memory arithmetic, and careful peripheral circuits to keep latency, energy, and density in balance while remaining compatible with standard dot-product CIM blocks. If the approach holds, it would let memory arrays handle more complete matrix workloads without shipping data off-chip.

Core claim

The proposed 3D-integrated SRAM-eDRAM hybrid CIM architecture performs general matrix operations directly within the memory crossbar at 4-bit precision by combining a specialized transpose-based structure, in-memory arithmetic operations, peripheral-aware design, and vertical SRAM-eDRAM integration, thereby balancing latency, energy efficiency, and compute density while staying compatible with conventional CIM dot-product architectures.

What carries the argument

The 3D SRAM-eDRAM hybrid memory-on-memory CIM crossbar with transpose-based in-memory arithmetic.

If this is right

  • CIM arrays can now execute complete matrix-level tasks instead of being limited to dot products.
  • The same hardware remains usable for conventional MAC operations without redesign.
  • Data movement between memory and compute units decreases for workloads heavy in transposes or element-wise math.
  • The architecture supports 4-bit precision general matrix work while preserving energy and density targets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Larger systems could chain multiple such memory stacks for bigger matrix problems without external memory traffic.
  • The approach may extend to other memory types or higher bit widths if the 3D stacking overheads stay manageable.
  • It opens a path for CIM to serve general high-performance computing workloads beyond neural-network inference.

Load-bearing premise

The specialized transpose architecture, in-memory arithmetic, peripheral design, and 3D SRAM-eDRAM stacking can be built without large unaccounted overheads that would destroy the claimed balance of speed, power, and density.

What would settle it

Fabrication results from a test chip showing measured latency, energy per operation, and effective compute density for matrix transpose and element-wise multiplication, compared against the simulated targets.

Figures

Figures reproduced from arXiv: 2604.13969 by Akhilesh R. Jaiswal, Ankur Singh, Subhradip Chakraborty.

Figure 1
Figure 1. Figure 1: The proposed overall architecture comprises two Layers: Layer A, [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Transpose based: (a) T-SRAM bit-cell; (b) T-eDRAM bit-cell; Multiplication and Addition based: (c) MA-SRAM bit-cell; (d) MA-eDRAM bit-cell. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Illustration of a 3 × 3 matrix transpose operation, demonstrating the reordering of matrix elements along the diagonal; (b) SRAM multi–sub-array architecture comprising both Transpose-SRAM (T-SRAM) and Multiply–Accumulate SRAM (MA-SRAM) banks; (c) Sub-array level organization of the T￾eDRAM architecture; (d) Proposed T-SRAM crossbar implementing a 4-bit word; and (e) Proposed T-eDRAM crossbar with a 4-… view at source ↗
Figure 4
Figure 4. Figure 4: Mapping of elements of Matrix A and B to perform element-wise [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (a) Proposed MA-SRAM sub-array implementing a 4-bit word structure; (b) Proposed MA-eDRAM sub-array designed with an 8-bit word organization; [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Data flow for performing analog addition using DAC [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Transient simulation on GF 22 nm FDSOI technology demonstrating [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: (a) T-SRAM DAC output voltage as a function of 4-bit digital input [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
Figure 8
Figure 8. Figure 8: Transient simulation demonstrating the copying of upper-diagonal [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 12
Figure 12. Figure 12: Variation of signal margin for (a) analog multiplication annd (b) [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Linearity analysis of 8-bit LFSR ADC for different (a) analog [PITH_FULL_IMAGE:figures/full_fig_p009_13.png] view at source ↗
read the original abstract

With the rapid growth of deep neural networks (DNNs), compute-in-memory (CIM) has emerged as a promising energy-efficient paradigm for accelerating multiply-and-accumulate (MAC) operations. Yet, current CIM architectures are largely limited to dot-product computations and struggle to efficiently support general-purpose matrix operations, such as transpose, element-wise addition, and multiplication. This work presents a 3D-integrated, memory-on-memory SRAM-eDRAM hybrid CIM architecture, implemented in GlobalFoundries 22~nm FDSOI technology, capable of performing general matrix operations directly within the memory crossbar with 4-bit precision. By leveraging a specialized transpose-based architecture, in-memory arithmetic operations, peripheral-aware design, and 3D SRAM--eDRAM integration, the proposed architecture balances latency, energy efficiency, and compute density for general purpose matrix operations while remaining compatible with the conventional CIM dot product architectures. Overall, this memory-on-memory CIM framework generalizes CIM beyond dot products, enabling versatile matrix processing and paving the way for broader applications in AI acceleration and general-purpose high performance computing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes GEM3D CIM, a 3D-integrated SRAM-eDRAM hybrid compute-in-memory architecture for general-purpose matrix operations (transpose, element-wise addition, multiplication) performed directly in the memory crossbar at 4-bit precision. It uses a specialized transpose-based design, in-memory arithmetic, peripheral-aware circuits, and 3D SRAM-eDRAM stacking in GlobalFoundries 22 nm FDSOI technology, claiming to balance latency, energy efficiency, and compute density while remaining compatible with conventional CIM dot-product flows.

Significance. If the performance balance holds, the work would usefully generalize CIM beyond dot-product acceleration to versatile matrix processing, with potential benefits for AI accelerators and HPC. The hybrid 3D memory-on-memory approach and compatibility with existing CIM are practical strengths worth exploring; the transpose-based in-memory arithmetic concept is a clear architectural contribution.

major comments (2)
  1. [Abstract] Abstract: the central claim that the architecture 'balances latency, energy efficiency, and compute density' for general matrix operations rests on design assertions without any supporting quantitative results, simulation data, post-layout extraction, or analytical derivations; this is load-bearing because the balance is the primary performance assertion.
  2. [Architecture Description] Architecture and integration sections: the assumption that 3D SRAM-eDRAM stacking incurs no significant unaccounted overheads (TSV parasitics, thermal coupling, eDRAM retention) that would unbalance the claimed metrics is not addressed with any analysis or foundry-calibrated 3D simulation; this directly affects the weakest assumption identified in the stress-test note.
minor comments (1)
  1. [Abstract] Abstract: '22~nm' should be written as '22 nm' for standard notation consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We have carefully reviewed the major comments and provide point-by-point responses below. We will incorporate revisions to address the concerns raised.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the architecture 'balances latency, energy efficiency, and compute density' for general matrix operations rests on design assertions without any supporting quantitative results, simulation data, post-layout extraction, or analytical derivations; this is load-bearing because the balance is the primary performance assertion.

    Authors: We agree that the abstract would benefit from explicit reference to supporting evidence. The full manuscript includes post-layout simulations in GlobalFoundries 22 nm FDSOI technology that quantify latency, energy, and density for transpose, element-wise addition, and multiplication operations at 4-bit precision, with direct comparisons to baseline CIM dot-product designs and conventional processors. These results underpin the balance claim. We will revise the abstract to concisely incorporate key quantitative highlights from the results section. revision: yes

  2. Referee: [Architecture Description] Architecture and integration sections: the assumption that 3D SRAM-eDRAM stacking incurs no significant unaccounted overheads (TSV parasitics, thermal coupling, eDRAM retention) that would unbalance the claimed metrics is not addressed with any analysis or foundry-calibrated 3D simulation; this directly affects the weakest assumption identified in the stress-test note.

    Authors: We acknowledge that a more rigorous treatment of 3D integration overheads is warranted. The current manuscript focuses on the architectural benefits of SRAM-eDRAM stacking but does not include detailed quantification of TSV parasitics, thermal coupling, or eDRAM retention effects. In the revised version, we will add a dedicated analysis subsection presenting foundry-calibrated 3D simulations that bound these overheads and demonstrate they remain within acceptable limits without unbalancing the reported latency, energy, and density metrics. revision: yes

Circularity Check

0 steps flagged

No circularity detected; architectural proposal without self-referential derivations

full rationale

The paper describes a proposed 3D SRAM-eDRAM hybrid CIM architecture for general matrix operations, asserting balance of latency, energy, and density in GlobalFoundries 22nm FDSOI. No equations, fitted parameters, or quantitative predictions appear in the abstract or description. Claims rest on design choices (transpose-based architecture, in-memory arithmetic, peripheral-aware design, 3D integration) rather than any derivation that reduces to its own inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The absence of a mathematical derivation chain means the patterns of self-definitional, fitted-input, or self-citation circularity do not apply; the work is a forward-looking design statement.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on technology-integration assumptions rather than mathematical derivations; no free parameters are fitted to data and no new physical entities are postulated beyond the proposed circuit architecture itself.

axioms (1)
  • domain assumption 3D integration of SRAM and eDRAM in 22nm FDSOI is feasible and does not introduce prohibitive overheads in latency or energy.
    Invoked to support the claimed balance of metrics for the hybrid memory-on-memory structure.
invented entities (1)
  • GEM3D CIM transpose-based architecture no independent evidence
    purpose: Enable general matrix operations inside the memory crossbar
    New design element introduced to generalize CIM beyond dot products.

pith-pipeline@v0.9.0 · 5503 in / 1339 out tokens · 53236 ms · 2026-05-10T12:04:37.889162+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Trends in energy estimates for comput- ing in ai/machine learning accelerators, supercomputers, and compute- intensive applications,

    S. Shankar and A. Reuther, “Trends in energy estimates for comput- ing in ai/machine learning accelerators, supercomputers, and compute- intensive applications,” in2022 IEEE High Performance Extreme Com- puting Conference (HPEC). IEEE, 2022, pp. 1–8

  2. [2]

    Breaking the von neu- mann bottleneck: architecture-level processing-in-memory technology,

    X. Zou, S. Xu, X. Chen, L. Yan, and Y . Han, “Breaking the von neu- mann bottleneck: architecture-level processing-in-memory technology,” Science China Information Sciences, vol. 64, no. 6, p. 160404, 2021

  3. [3]

    Fifty years of moore’s law,

    C. A. Mack, “Fifty years of moore’s law,”IEEE Transactions on semiconductor manufacturing, vol. 24, no. 2, pp. 202–207, 2011

  4. [4]

    Hitting the memory wall: Implications of the obvious,

    W. A. Wulf and S. A. McKee, “Hitting the memory wall: Implications of the obvious,”ACM SIGARCH computer architecture news, vol. 23, no. 1, pp. 20–24, 1995

  5. [5]

    Memory devices and applications for in-memory computing,

    A. Sebastian, M. Le Gallo, R. Khaddam-Aljameh, and E. Eleftheriou, “Memory devices and applications for in-memory computing,”Nature nanotechnology, vol. 15, no. 7, pp. 529–544, 2020

  6. [6]

    Compute-in-memory chips for deep learning: Recent trends and prospects,

    S. Yu, H. Jiang, S. Huang, X. Peng, and A. Lu, “Compute-in-memory chips for deep learning: Recent trends and prospects,”IEEE circuits and systems magazine, vol. 21, no. 3, pp. 31–56, 2021

  7. [7]

    Emerging nvm: A survey on architectural integration and research challenges,

    J. Boukhobza, S. Rubini, R. Chen, and Z. Shao, “Emerging nvm: A survey on architectural integration and research challenges,”ACM Transactions on Design Automation of Electronic Systems (TODAES), vol. 23, no. 2, pp. 1–32, 2017

  8. [8]

    Challenges and trends of sram-based computing-in-memory for ai edge devices,

    C.-J. Jhang, C.-X. Xue, J.-M. Hung, F.-C. Chang, and M.-F. Chang, “Challenges and trends of sram-based computing-in-memory for ai edge devices,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 5, pp. 1773–1786, 2021

  9. [9]

    Drisa: A dram-based reconfigurable in-situ accelerator,

    S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y . Xie, “Drisa: A dram-based reconfigurable in-situ accelerator,” inProceedings of the 50th annual ieee/acm international symposium on microarchitecture, 2017, pp. 288–301

  10. [10]

    A 16.38 tops and 4.55 pops/w sram computing-in-memory macro for signed operands computation and batch normalization im- plementation,

    X. Qiao, Q. Guo, X. Tang, J. Song, R. Wei, M. Li, R. Wang, and Y . Wang, “A 16.38 tops and 4.55 pops/w sram computing-in-memory macro for signed operands computation and batch normalization im- plementation,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 71, no. 4, pp. 1706–1718, 2024

  11. [12]

    Analog compute-in-memory for ai edge inference,

    D. Fick, “Analog compute-in-memory for ai edge inference,” in2022 International Electron Devices Meeting (IEDM). IEEE, 2022, pp. 21–8

  12. [13]

    Mixed-precision in-memory computing,

    M. Le Gallo, A. Sebastian, R. Mathis, M. Manica, H. Giefers, T. Tuma, C. Bekas, A. Curioni, and E. Eleftheriou, “Mixed-precision in-memory computing,”Nature Electronics, vol. 1, no. 4, pp. 246–253, 2018

  13. [14]

    A 1.91 pops/w energy-efficient sram based signed multi-bit time domain cim architecture,

    S. Chakraborty, D. Kushwaha, H. Ranjan, and S. Dasgupta, “A 1.91 pops/w energy-efficient sram based signed multi-bit time domain cim architecture,” in2025 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 2025, pp. 1–5

  14. [15]

    A survey of sram- based in-memory computing techniques and applications,

    S. Mittal, G. Verma, B. Kaushik, and F. A. Khanday, “A survey of sram- based in-memory computing techniques and applications,”Journal of Systems Architecture, vol. 119, p. 102276, 2021

  15. [16]

    X-sram: Enabling in- memory boolean computations in cmos static random access memories,

    A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, “X-sram: Enabling in- memory boolean computations in cmos static random access memories,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4219–4232, 2018

  16. [17]

    A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws,

    G. A. Sod, “A survey of several finite difference methods for systems of nonlinear hyperbolic conservation laws,”Journal of computational physics, vol. 27, no. 1, pp. 1–31, 1978

  17. [18]

    Hadamard product in deep learning: Introduction, advances and challenges,

    G. G. Chrysos, Y . Wu, R. Pascanu, P. Torr, and V . Cevher, “Hadamard product in deep learning: Introduction, advances and challenges,”IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

  18. [19]

    Tensormask: A founda- tion for dense object segmentation,

    X. Chen, R. Girshick, K. He, and P. Doll ´ar, “Tensormask: A founda- tion for dense object segmentation,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 2061–2069

  19. [21]

    Neural cache: Bit-serial in-cache acceleration of deep neural networks,

    C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaaauw, and R. Das, “Neural cache: Bit-serial in-cache acceleration of deep neural networks,” in2018 ACM/IEEE 45Th annual international symposium on computer architecture (ISCA). IEEE, 2018, pp. 383–396

  20. [22]

    Hadamard product-based in-memory computing design for floating point neural network training,

    A. Fan, Y . Fu, Y . Tao, Z. Jin, H. Han, H. Liu, Y . Zhang, B. Yan, Y . Yang, and R. Huang, “Hadamard product-based in-memory computing design for floating point neural network training,”Neuromorphic Computing and Engineering, vol. 3, no. 1, p. 014009, 2023. GEM3D CIM 12

  21. [23]

    A 4nm 6163-tops/w/b 4790-TOPS/mm 2/b sram based digital- computing-in-memory macro supporting bit-width flexibility and simul- taneous mac and weight update,

    “A 4nm 6163-tops/w/b 4790-TOPS/mm 2/b sram based digital- computing-in-memory macro supporting bit-width flexibility and simul- taneous mac and weight update,” in2023 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2023, pp. 132–134

  22. [24]

    Monolithic 3d integration of logic, memory and computing-in-memory for one-shot learning,

    Y . Li, J. Tang, B. Gao, J. Yao, Y . Xi, Y . Li, T. Li, Y . Zhou, Z. Liu, Q. Zhanget al., “Monolithic 3d integration of logic, memory and computing-in-memory for one-shot learning,” in2021 IEEE Interna- tional Electron Devices Meeting (IEDM). IEEE, 2021, pp. 21–5

  23. [25]

    Monolithic 3d integration of analog rram- based computing-in-memory and sensor for energy-efficient near-sensor computing,

    Y . Du, J. Tang, Y . Li, Y . Xi, Y . Li, J. Li, H. Huang, Q. Qin, Q. Zhang, B. Gaoet al., “Monolithic 3d integration of analog rram- based computing-in-memory and sensor for energy-efficient near-sensor computing,”Advanced Materials, vol. 36, no. 22, p. 2302658, 2024

  24. [26]

    A monolithic 3d hybrid architecture for energy- efficient computation,

    Y . Yu and N. K. Jha, “A monolithic 3d hybrid architecture for energy- efficient computation,”IEEE Transactions on Multi-Scale Computing Systems, vol. 4, no. 4, pp. 533–547, 2018

  25. [27]

    A monolithic-3d sram design with enhanced robustness and in-memory computation support,

    S. Srinivasa, A. K. Ramanathan, X. Li, W.-H. Chen, F.-K. Hsueh, C.-C. Yang, C.-H. Shen, J.-M. Shieh, S. Gupta, M.-F. M. Changet al., “A monolithic-3d sram design with enhanced robustness and in-memory computation support,” inProceedings of the International Symposium on Low Power Electronics and Design, 2018, pp. 1–6

  26. [28]

    3d integrated circuit,

    Z. Or-Bach, D. C. Sekar, and B. Cronquist, “3d integrated circuit,” Patent US11 018 133B2. [Online]. Available: https://patents.google.com/ patent/US11018133B2/en

  27. [29]

    High-density integration of functional modules using monolithic 3d-ic technology,

    S. Panth, K. Samadi, Y . Du, and S. K. Lim, “High-density integration of functional modules using monolithic 3d-ic technology,” in2013 18th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 2013, pp. 681–686

  28. [30]

    Designing vertical processors in mono- lithic 3d,

    B. Gopireddy and J. Torrellas, “Designing vertical processors in mono- lithic 3d,” inProceedings of the 46th International Symposium on Computer Architecture, 2019, pp. 643–656

  29. [31]

    Monolithic 3d integration: A path from concept to reality,

    M. M. Shulaker, T. F. Wu, M. M. Sabry, H. Wei, H.-S. P. Wong, and S. Mitra, “Monolithic 3d integration: A path from concept to reality,” in2015 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2015, pp. 1197–1202

  30. [32]

    8t sram cell as a multibit dot-product engine for beyond von neumann computing,

    A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, “8t sram cell as a multibit dot-product engine for beyond von neumann computing,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 11, pp. 2556–2567, 2019

  31. [33]

    Extremely-low threshold voltage finfet for 5g mmwave applications,

    A. Razavieh, Y . Chen, T. Ethirajan, M. Gu, S. Cimino, T. Shimizu, M. Hassan, T. Morshed, J. Singh, W. Zhenget al., “Extremely-low threshold voltage finfet for 5g mmwave applications,”IEEE Journal of the Electron Devices Society, vol. 9, pp. 165–169, 2020

  32. [34]

    Comparison of binary and lfsr counters and efficient lfsr decoding algorithm,

    A. Ajane, P. M. Furth, E. E. Johnson, and R. L. Subramanyam, “Comparison of binary and lfsr counters and efficient lfsr decoding algorithm,” in2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS). IEEE, 2011, pp. 1–4

  33. [35]

    Cimat: A compute-in-memory architecture for on-chip training based on transpose sram arrays,

    H. Jiang, X. Peng, S. Huang, and S. Yu, “Cimat: A compute-in-memory architecture for on-chip training based on transpose sram arrays,”IEEE Transactions on Computers, vol. 69, no. 7, pp. 944–954, 2020

  34. [36]

    A 28 nm 16 kb bit-scalable charge-domain transpose 6t sram in-memory computing macro,

    J. Song, X. Tang, X. Qiao, Y . Wang, R. Wang, and R. Huang, “A 28 nm 16 kb bit-scalable charge-domain transpose 6t sram in-memory computing macro,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 70, no. 5, pp. 1835–1845, 2023

  35. [37]

    A 28-nm compute sram with bit-serial logic/arithmetic op- erations for programmable in-memory vector computing,

    J. Wang, X. Wang, C. Eckert, A. Subramaniyan, R. Das, D. Blaauw, and D. Sylvester, “A 28-nm compute sram with bit-serial logic/arithmetic op- erations for programmable in-memory vector computing,”IEEE Journal of Solid-State Circuits, vol. 55, no. 1, pp. 76–86, 2019

  36. [38]

    Fat: An in-memory accelerator with fast addition for ternary weight neural networks,

    S. Zhu, L. H. Duong, H. Chen, D. Liu, and W. Liu, “Fat: An in-memory accelerator with fast addition for ternary weight neural networks,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 3, pp. 781–794, 2022

  37. [39]

    Bit parallel 6t sram in-memory computing with reconfigurable bit-precision,

    K. Lee, J. Jeong, S. Cheon, W. Choi, and J. Park, “Bit parallel 6t sram in-memory computing with reconfigurable bit-precision,” in2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 2020, pp. 1–6

  38. [40]

    Low-variation sram bitcells in 22nm fdsoi technology,

    V . Joshi, H. Ramamurthy, S. Balasubramanian, S. Seo, H. Yoon, X. Zou, N. Chan, J. Yun, T. Klick, E. Smithet al., “Low-variation sram bitcells in 22nm fdsoi technology,” in2017 Symposium on VLSI Technology. IEEE, 2017, pp. T222–T223

  39. [41]

    A compute-in-memory hardware accelerator design with back-end-of-line (beol) transistor based reconfigurable interconnect,

    Y . Luo, S. Dutta, A. Kaul, S. K. Lim, M. Bakir, S. Datta, and S. Yu, “A compute-in-memory hardware accelerator design with back-end-of-line (beol) transistor based reconfigurable interconnect,”IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 12, no. 2, pp. 445–457, 2022

  40. [42]

    Oxide semicon- ductors tfts integration in cmos beol: Device considerations for enabling novel applications,

    S. Subhechha, N. Rassoul, A. Belmonte, A. Chasin, H. Dekkers, M. J. van Setten, A. Kruv, Y . Wan, H. Tang, A. Pavelet al., “Oxide semicon- ductors tfts integration in cmos beol: Device considerations for enabling novel applications,” in2025 Device Research Conference (DRC). IEEE, 2025, pp. 1–2