Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration

Aleksi Korsman; Gaurav Singh; Jelin Leslin; Jussi Ryyn\"anen; Kazybek Adam; Marko Kosunen; Martin Andraud; Omar Numan; Otto Simola

arxiv: 2506.15440 · v1 · submitted 2025-06-18 · 💻 cs.AR

Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration

Omar Numan , Gaurav Singh , Kazybek Adam , Jelin Leslin , Aleksi Korsman , Otto Simola , Marko Kosunen , Jussi Ryyn\"anen

show 1 more author

Martin Andraud

This is my paper

Pith reviewed 2026-05-19 09:14 UTC · model grok-4.3

classification 💻 cs.AR

keywords compute-in-memorymixed-signal CIMRISC-V calibrationself-calibrationanalog variation compensationSNR improvement22 nm FDSOIresistive memory

0 comments

The pith

RISC-V controlled on-chip calibration raises mixed-signal CIM compute SNR by 25 to 45 percent to reach 18-24 dB.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a mixed-signal compute-in-memory accelerator fabricated in 22 nm FDSOI that stores weights in SRAM while performing multi-bit operations with linear resistors. An integrated RISC-V processor runs an automated calibration routine that compensates for analog variations across different columns. This compensation lifts signal-to-noise ratio enough to support reliable computation for AI workloads. The design also supplies an open-source interface for programming and testing the CIM system. The same calibration approach is shown to extend to newer high-density resistor technologies for higher performance.

Core claim

The central claim is that embedding a RISC-V core to drive on-chip self-calibration in a resistive mixed-signal CIM SoC compensates analog variations, improving compute SNR by 25 to 45 percent across columns and reaching 18-24 dB while combining SRAM density with multi-bit resistive computation in a single 22 nm FDSOI chip.

What carries the argument

RISC-V controlled on-chip calibration routine that identifies and compensates analog variations in the resistive CIM columns.

If this is right

SRAM-based weight storage paired with linear resistors enables both density and multi-bit accuracy in one CIM core.
Integration with an open-source RISC-V processor supplies a practical path to end-to-end AI acceleration on the SoC.
The calibration method supports extension to emerging high-density linear resistor technologies for improved energy or speed.
Reliable 18-24 dB SNR makes mixed-signal CIM cores more viable for production AI inference hardware.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Production chips could ship with built-in calibration that removes the need for external test equipment at volume.
The same control loop might be reused to track aging or temperature drift during field operation.
Open-source programming interfaces could shorten the time for other groups to adopt similar calibrated CIM designs.

Load-bearing premise

The automated calibration routine driven by the RISC-V core can reliably detect and correct analog variations across columns without introducing new systematic errors or needing per-chip manual tuning.

What would settle it

Fabricate multiple chips, run the RISC-V calibration routine on each, measure compute SNR before and after on the same columns, and check whether the reported 25-45 percent gain appears consistently without extra off-chip adjustments.

Figures

Figures reproduced from arXiv: 2506.15440 by Aleksi Korsman, Gaurav Singh, Jelin Leslin, Jussi Ryyn\"anen, Kazybek Adam, Marko Kosunen, Martin Andraud, Omar Numan, Otto Simola.

**Figure 2.** Figure 2: Proof-of-concept Acore-CIM SoC composed of a 32-bit RISC-V core controlling a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: (a) 6-bit R-2R MDAC Input DAC Cell with an extra sign bit for [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: MDAC Weight Cell (MWC) schematic: Uses an R-2R MDAC with [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Illustration of the open-source simulation framework of the proposed [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Error distributions for a selected CIM column during characterization [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: (a) Uncalibrated MAC outputs across CIM columns. (b) Extracted per-column gain ( [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: A comparison of spatial variation enhancement across CIM columns [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

**Figure 10.** Figure 10: Compute SNR boost across CIM columns with BISC, achieving an [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗

read the original abstract

Developing accurate and reliable Compute-In-Memory (CIM) architectures is becoming a key research focus to accelerate Artificial Intelligence (AI) tasks on hardware, particularly Deep Neural Networks (DNNs). In that regard, there has been significant interest in analog and mixed-signal CIM architectures aimed at increasing the efficiency of data storage and computation to handle the massive amount of data needed by DNNs. Specifically, resistive mixed-signal CIM cores are pushed by recent progresses in emerging Non-Volatile Memory (eNVM) solutions. Yet, mixed-signal CIM computing cores still face several integration and reliability challenges that hinder their large-scale adoption into end-to-end AI computing systems. In terms of integration, resistive and eNVM-based CIM cores need to be integrated with a control processor to realize end-to-end AI acceleration. Moreover, SRAM-based CIM architectures are still more efficient and easier to program than their eNVM counterparts. In terms of reliability, analog circuits are more susceptible to variations, leading to computation errors and degraded accuracy. This work addresses these two challenges by proposing a self-calibrated mixed-signal CIM accelerator SoC, fabricated in 22-nm FDSOI technology. The integration is facilitated by (1) the CIM architecture, combining the density and ease of SRAM-based weight storage with multi-bit computation using linear resistors, and (2) an open-source programming and testing strategy for CIM systems. The accuracy and reliability are enabled through an automated RISC-V controlled on-chip calibration, allowing us to improve the compute SNR by 25 to 45% across multiple columns to reach 18-24 dB. To showcase further integration possibilities, we show how our proof-of-concept SoC can be extended to recent high-density linear resistor technologies for enhanced computing performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A 22nm fabricated SoC integrates RISC-V control with resistive CIM and reports measured SNR gains after calibration, but the calibration method itself stays thinly described.

read the letter

The main thing here is a fabricated 22-nm SoC that uses a RISC-V core to run self-calibration on a mixed-signal CIM array built with SRAM storage and linear resistors, delivering measured SNR improvements of 25-45% to reach 18-24 dB across columns. What the paper does well is show a concrete way to integrate the analog compute core with a standard processor for end-to-end control and calibration. The choice to stick with SRAM for storage while using resistors for the compute part makes sense for programmability, and reporting real silicon measurements instead of simulations adds credibility. The mention of an open-source strategy for programming and testing is also a practical plus that could help adoption. The soft spots center on the calibration details. The abstract states the SNR gains but skips how the RISC-V identifies column variations, applies corrections, or stores the coefficients. There is no information on the number of chips tested, error bars, or whether the routine works without per-chip manual intervention. If the full paper does not fill this in with algorithm descriptions and multi-die data, the reliability claim stays hard to evaluate fully. This paper is for engineers working on CIM hardware for AI systems who care about bridging analog compute with digital control. A reader building similar resistive or mixed-signal prototypes would get value from the integration approach and the performance numbers. It is worth a serious referee because the silicon results address a relevant problem, even if the calibration section needs more evidence to stand up to scrutiny. I would send it out for peer review.

Referee Report

2 major / 1 minor

Summary. The manuscript presents Acore-CIM, a mixed-signal CIM accelerator SoC fabricated in 22-nm FDSOI technology. It combines SRAM-based weight storage with linear resistors for multi-bit analog computation, integrates an open-source RISC-V core for control and testing, and introduces an automated on-chip self-calibration routine to mitigate analog variations. The central result is a reported 25-45% improvement in compute SNR across columns, reaching 18-24 dB, with additional discussion on extending the approach to high-density resistor technologies for better performance.

Significance. If the on-chip calibration proves robust, reproducible across dies, and free of new systematic errors, the work would meaningfully advance practical mixed-signal CIM systems for DNN acceleration by solving both processor integration and variation-induced reliability issues. The SRAM-plus-resistor architecture and RISC-V-driven open-source flow are practical strengths that could aid adoption. The SNR gains, if supported by adequate experimental statistics, would provide useful evidence for the viability of calibrated analog CIM cores.

major comments (2)

The automated RISC-V-controlled calibration routine is load-bearing for the reliability claims yet is described only at a high level. No algorithm, pseudocode, or step-by-step sequence is provided showing how column-specific offsets and gains are identified using solely on-chip resources, how reference signals are generated, or how coefficients are stored and applied without introducing bias from RISC-V timing jitter or reference drift.
The reported SNR improvements (25-45% to 18-24 dB) lack supporting experimental details. No information is given on the number of chips tested, measurement conditions, error bars, or statistical analysis, which is required to substantiate that the gains are achieved reproducibly without per-device manual tuning.

minor comments (1)

The abstract and introduction could more explicitly distinguish the contributions of the resistor-based computation from the calibration routine to clarify the source of the SNR gains.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below and will revise the manuscript to provide the requested clarifications and additional details.

read point-by-point responses

Referee: The automated RISC-V-controlled calibration routine is load-bearing for the reliability claims yet is described only at a high level. No algorithm, pseudocode, or step-by-step sequence is provided showing how column-specific offsets and gains are identified using solely on-chip resources, how reference signals are generated, or how coefficients are stored and applied without introducing bias from RISC-V timing jitter or reference drift.

Authors: We agree that the current high-level description of the calibration routine limits reproducibility. In the revised manuscript we will add a dedicated section with the full algorithm, pseudocode, and a step-by-step sequence that explains how column-specific offsets and gains are determined using only on-chip resources, how reference signals are generated, and how coefficients are stored and applied. We will also add explicit discussion of how the implementation avoids systematic bias from RISC-V timing jitter and reference drift, supported by our measured results. revision: yes
Referee: The reported SNR improvements (25-45% to 18-24 dB) lack supporting experimental details. No information is given on the number of chips tested, measurement conditions, error bars, or statistical analysis, which is required to substantiate that the gains are achieved reproducibly without per-device manual tuning.

Authors: We acknowledge that the experimental section currently lacks the requested statistics. In the revised manuscript we will expand the results to report the number of chips measured, the precise measurement conditions, error bars on the SNR values, and the statistical analysis performed. These additions will demonstrate that the reported 25-45% SNR gains are reproducible across devices without per-device manual tuning. revision: yes

Circularity Check

0 steps flagged

No significant circularity; results are empirical silicon measurements

full rationale

The paper describes a fabricated mixed-signal CIM SoC in 22-nm FDSOI with an RISC-V-driven on-chip calibration routine. The central performance claims (25-45% SNR improvement to 18-24 dB across columns) are presented as direct post-silicon measurements rather than quantities derived from equations, fitted parameters, or self-citations. No load-bearing step reduces by construction to its own inputs; the work is self-contained against external benchmarks of fabricated hardware behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions about circuit variation in 22-nm FDSOI and the ability of a digital processor to measure and correct analog column behavior; no new physical entities or ad-hoc constants are introduced in the abstract.

axioms (1)

domain assumption Analog column variations in resistive CIM can be measured and corrected by a digital control loop without significant additional error sources
Invoked when claiming that RISC-V controlled calibration improves SNR to 18-24 dB

pith-pipeline@v0.9.0 · 5890 in / 1311 out tokens · 32733 ms · 2026-05-19T09:14:32.071297+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

automated RISC-V controlled on-chip calibration, allowing us to improve the compute SNR by 25 to 45% across multiple columns to reach 18-24 dB
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BISC routine... least-squares fit over Z test points... R1_SA and V1_CAL

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

[1]

Digital Versus Analog Artificial Intelligence Accelerators: Advances, trends, and emerging designs,

J.-s. Seo, J. Saikia, J. Meng, W. He, H.-s. Suh, Anupreetham, Y . Liao, A. Hasssan, and I. Yeo, “Digital Versus Analog Artificial Intelligence Accelerators: Advances, trends, and emerging designs,”IEEE Solid-State Circuits Magazine, vol. 14, no. 3, pp. 65–79, 2022

work page 2022
[2]

Mixed-signal computing for deep neural network in- ference,

B. Murmann, “Mixed-signal computing for deep neural network in- ference,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 1, pp. 3–13, 2021

work page 2021
[3]

An SRAM-Based Recon- figurable Cognitive Computation Matrix for Sensor Edge Applications,

S.-Y . Peng, I.-C. Liu, Y .-H. Wu, T.-J. Lin, C.-J. Chen, X.-Z. Li, Y .-Q. Cheng, P.-H. Lin, K.-H. Hung, and Y . Tsao, “An SRAM-Based Recon- figurable Cognitive Computation Matrix for Sensor Edge Applications,” IEEE Journal of Solid-State Circuits , vol. 59, no. 2, pp. 636–648, 2024

work page 2024
[4]

C3SRAM: An In-Memory- Computing SRAM Macro Based on Robust Capacitive Coupling Com- puting Mechanism,

Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An In-Memory- Computing SRAM Macro Based on Robust Capacitive Coupling Com- puting Mechanism,” IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897, 2020

work page 2020
[5]

A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,

W.-H. Chen, K.-X. Li, W.-Y . Lin, K.-H. Hsu, P.-Y . Li, C.-H. Yang, C.-X. Xue, E.-Y . Yang, Y .-K. Chen, Y .-S. Chang, T.-H. Hsu, Y .-C. King, C.-J. Lin, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,” in 2018 IEEE Interna...

work page 2018
[6]

A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute,

H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute,” IEEE Journal of Solid-State Circuits , vol. 54, no. 6, pp. 1789–1799, 2019

work page 2019
[7]

An RRAM-Based Digital Computing-in-Memory Macro With Dynamic V oltage Sense Amplifier and Sparse-Aware Ap- proximate Adder Tree,

Y . He, J. Yue, X. Feng, Y . Huang, H. Jia, J. Wang, L. Zhang, W. Sun, H. Yang, and Y . Liu, “An RRAM-Based Digital Computing-in-Memory Macro With Dynamic V oltage Sense Amplifier and Sparse-Aware Ap- proximate Adder Tree,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 2, pp. 416–420, 2023. 11

work page 2023
[8]

8-b Precision 8-Mb ReRAM Compute-in-Memory Macro Using Direct-Current-Free Time-Domain Readout Scheme for AI Edge Devices,

J.-M. Hung, T.-H. Wen, Y .-H. Huang, S.-P. Huang, F.-C. Chang, C.-I. Su, W.-S. Khwa, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, Y .-D. Chih, T.-Y . J. Chang, and M.-F. Chang, “8-b Precision 8-Mb ReRAM Compute-in-Memory Macro Using Direct-Current-Free Time-Domain Readout Scheme for AI Edge Devices,” IEEE Journal of Solid-State Circuits, vol. 58, no. 1, ...

work page 2023
[9]

CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference,

M. Giordano, K. Prabhu, K. Koul, R. M. Radway, A. Gural, R. Doshi, Z. F. Khan, J. W. Kustin, T. Liu, G. B. Lopes, V . Turbiner, W.-S. Khwa, Y .-D. Chih, M.-F. Chang, G. Lallement, B. Murmann, S. Mitra, and P. Raina, “CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference,” in 20...

work page 2021
[10]

A 40-nm 118.44-TOPS/W V oltage-Sensing Compute-in-Memory RRAM Macro With Write Verification and Multi- Bit Encoding,

J.-H. Yoon, M. Chang, W.-S. Khwa, Y .-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40-nm 118.44-TOPS/W V oltage-Sensing Compute-in-Memory RRAM Macro With Write Verification and Multi- Bit Encoding,” IEEE Journal of Solid-State Circuits , vol. 57, no. 3, pp. 845–857, 2022

work page 2022
[11]

A 22nm 32Mb Embedded STT-MRAM Macro Achieving 5.9ns Random Read Access and 5.8MB/s Write Throughput at up to Tj of 150 °C,

T. Shimoi, K. Matsubara, T. Saito, T. Ogawa, Y . Taito, Y . Kaneda, M. Izuna, K. Takeda, H. Mitani, T. Ito, and T. Kono, “A 22nm 32Mb Embedded STT-MRAM Macro Achieving 5.9ns Random Read Access and 5.8MB/s Write Throughput at up to Tj of 150 °C,” in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 134–135

work page 2022
[12]

Analog In- memory Computing in FeFET-based 1T1R Array for Edge AI Applica- tions,

D. Saito, T. Kobayashi, H. Koga, N. Ronchi, K. Banerjee, Y . Shuto, J. Okuno, K. Konishi, L. Di Piazza, A. Mallik, J. Van Houdt, M. Tsukamoto, K. Ohkuri, T. Umebayashi, and T. Ezaki, “Analog In- memory Computing in FeFET-based 1T1R Array for Edge AI Applica- tions,” in 2021 Symposium on VLSI Circuits , 2021, pp. 1–2

work page 2021
[13]

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference,

Le Gallo, et al., “A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference,” vol. 6, no. 9, pp. 680–693, 2023

work page 2023
[14]

An Analytical Method to Determine Min- imum Per-Layer Precision of Deep Neural Networks,

C. Sakr and N. Shanbhag, “An Analytical Method to Determine Min- imum Per-Layer Precision of Deep Neural Networks,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1090–1094

work page 2018
[15]

Benchmarking In-Memory Computing Architectures,

N. R. Shanbhag and S. K. Roy, “Benchmarking In-Memory Computing Architectures,” IEEE Open Journal of the Solid-State Circuits Society , vol. 2, pp. 288–300, 2022

work page 2022
[16]

Energy-accuracy trade-offs for resistive in-memory computing architectures,

S. K. Roy and N. R. Shanbhag, “Energy-accuracy trade-offs for resistive in-memory computing architectures,” IEEE Journal on Exploratory Solid-State Computational Devices and Circuits , vol. 10, pp. 22–30, 2024

work page 2024
[17]

A 7-nm Compute-in- Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS,

M. E. Sinangil, B. Erbagci, R. Naous, K. Akarvardar, D. Sun, W.- S. Khwa, H.-J. Liao, Y . Wang, and J. Chang, “A 7-nm Compute-in- Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS,” IEEE Journal of Solid- State Circuits, vol. 56, no. 1, pp. 188–198, 2021

work page 2021
[18]

A 44.2-TOPS/W CNN Processor With Variation-Tolerant Analog Datapath and Variation Compensating Circuit,

J.-O. Seo, M. Seok, and S. Cho, “A 44.2-TOPS/W CNN Processor With Variation-Tolerant Analog Datapath and Variation Compensating Circuit,” IEEE Journal of Solid-State Circuits , vol. 59, no. 5, pp. 1603– 1611, 2024

work page 2024
[19]

An 8-bit 20.7 TOPS/W Multi- Level Cell ReRAM-based Compute Engine,

J. M. Correll, L. Jie, S. Song, S. Lee, J. Zhu, W. Tang, L. Wormald, J. Er- hardt, N. Breil, R. Quon, D. Kamalanathan, S. Krishnan, M. Chudzik, Z. Zhang, W. D. Lu, and M. P. Flynn, “An 8-bit 20.7 TOPS/W Multi- Level Cell ReRAM-based Compute Engine,” in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 264–265

work page 2022
[20]

Testing and reliability of computing-in memories: Solutions and challenges,

J.-F. Li, “Testing and reliability of computing-in memories: Solutions and challenges,” in 2022 IEEE International Test Conference in Asia (ITC-Asia), 2022, pp. 55–60

work page 2022
[21]

On the reliability of computing- in-memory accelerators for deep neural networks,

Z. Yan, X. S. Hu, and Y . Shi, “On the reliability of computing- in-memory accelerators for deep neural networks,” 2022. [Online]. Available: https://arxiv.org/abs/2205.13018

work page arXiv 2022
[22]

An MDAC synapse for analog neural networks,

R. Kier, R. Harrison, and R. Beer, “An MDAC synapse for analog neural networks,” in 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512) , vol. 5, 2004

work page 2004
[23]

A low power bulk-driven MDAC synapse,

R. Laajimi, B. Hamdi, and N. Ayari, “A low power bulk-driven MDAC synapse,” in 2011 International Conference on Applied Electronics , 2011, pp. 1–5

work page 2011
[24]

Hardware Neural Network using Hybrid Synapses via Transfer Learning: WOx Nano-Resistors and TiOx RRAM Synapse for Energy-Efficient Edge-AI Sensor,

W. Choi, M. Kwak, S. Heo, K. Lee, S. Lee, and H. Hwang, “Hardware Neural Network using Hybrid Synapses via Transfer Learning: WOx Nano-Resistors and TiOx RRAM Synapse for Energy-Efficient Edge-AI Sensor,” in 2021 IEEE International Electron Devices Meeting (IEDM) , 2021, pp. 23.1.1–23.1.4

work page 2021
[25]

Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,

C.-J. Jhang, C.-X. Xue, J.-M. Hung, F.-C. Chang, and M.-F. Chang, “Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 5, pp. 1773–1786, 2021

work page 2021
[26]

Comprehending In-memory Computing Trends via Proper Benchmarking,

N. R. Shanbhag and S. K. Roy, “Comprehending In-memory Computing Trends via Proper Benchmarking,” in 2022 IEEE Custom Integrated Circuits Conference (CICC) , 2022, pp. 01–07

work page 2022
[27]

EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High- Density Digital Storage,

A. Levy, L. R. Upton, M. D. Scott, D. Rich, W.-S. Khwa, Y .-D. Chih, M.-F. Chang, S. Mitra, B. Murmann, and P. Raina, “EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High- Density Digital Storage,” IEEE Journal of Solid-State Circuits , vol. 59, no. 7, pp. 2081–2092, 2024

work page 2081
[28]

A 40-nm Compute-in-Memory Macro With RRAM Addressing IR Drop and Off- State Current,

S. D. Spetalnick, M. Chang, S. Konno, B. Crafton, A. S. Lele, W.- S. Khwa, Y .-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40-nm Compute-in-Memory Macro With RRAM Addressing IR Drop and Off- State Current,” IEEE Solid-State Circuits Letters , vol. 7, pp. 10–13, 2024

work page 2024
[29]

Trends and challenges in the circuit and macro of RRAM-based computing-in- memory systems,

S.-T. Wei, B. Gao, D. Wu, J.-S. Tang, H. Qian, and H.-Q. Wu, “Trends and challenges in the circuit and macro of RRAM-based computing-in- memory systems,” Chip, vol. 1, no. 1, p. 100004, 2022

work page 2022
[30]

DIANA: An End-to-End Energy-Efficient Digital and ANAlog Hybrid Neural Network SoC,

K. Ueyoshi, I. A. Papistas, P. Houshmand, G. M. Sarda, V . Jain, M. Shi, Q. Zheng, S. Giraldo, P. Vrancx, J. Doevenspeck, D. Bhattacharjee, S. Cosemans, A. Mallik, P. Debacker, D. Verkest, and M. Verhelst, “DIANA: An End-to-End Energy-Efficient Digital and ANAlog Hybrid Neural Network SoC,” in 2022 IEEE International Solid- State Circuits Conference (ISSC...

work page 2022
[31]

RxNN: A Frame- work for Evaluating Deep Neural Networks on Resistive Crossbars,

S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “RxNN: A Frame- work for Evaluating Deep Neural Networks on Resistive Crossbars,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 2, pp. 326–338, 2021

work page 2021
[32]

[Online]

A-core, an open-source risc-v processor implementation gitlab repository. [Online]. Available: https://gitlab.com/a-core

work page
[33]

[Online]

Chisel, software-defined hardware. [Online]. Available: https://www. chisel-lang.org/

work page
[34]

Non-V olatile RRAM Embedded into 22FFL FinFET Technology,

O. Golonzka et al., “Non-V olatile RRAM Embedded into 22FFL FinFET Technology,” in 2019 Symposium on VLSI Technology, 2019, pp. T230– T231

work page 2019
[35]

The mnist database of handwritten digit images for machine learning research,

L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine , vol. 29, no. 6, pp. 141–142, 2012. Omar Numan (Student Member, IEEE) received his M.Sc. degree in Micro- and Nano-electronic Circuit Design from Aalto University, Finland, in 2020, and is currently pursuing a Ph.D. at the same institu...

work page 2012
[36]

His research is in analog circuit design for AI accelerators with focus on Sample and Hold circuits, analog memories, and noise analysis

He is currently pursuing his doctoral studies at the Department of Electronics and Nanoengineering, Aalto University, Finland. His research is in analog circuit design for AI accelerators with focus on Sample and Hold circuits, analog memories, and noise analysis. Jelin Leslin (Student Member, IEEE) is a doc- toral candidate at Aalto University specializi...

work page 2020
[37]

degree at the institution

Currently, he is pursuing he’s Ph.D. degree at the institution. His research topic is program- matic circuit design especially relating to processor hardware acceleration and digital signal processing applications. Marko Kosunen (S’97´M”07) received his M.Sc, L.Sc and D.Sc (with honors) degrees from Helsinki University of Technology, Espoo, Finland, in 19...

work page 1998

[1] [1]

Digital Versus Analog Artificial Intelligence Accelerators: Advances, trends, and emerging designs,

J.-s. Seo, J. Saikia, J. Meng, W. He, H.-s. Suh, Anupreetham, Y . Liao, A. Hasssan, and I. Yeo, “Digital Versus Analog Artificial Intelligence Accelerators: Advances, trends, and emerging designs,”IEEE Solid-State Circuits Magazine, vol. 14, no. 3, pp. 65–79, 2022

work page 2022

[2] [2]

Mixed-signal computing for deep neural network in- ference,

B. Murmann, “Mixed-signal computing for deep neural network in- ference,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 29, no. 1, pp. 3–13, 2021

work page 2021

[3] [3]

An SRAM-Based Recon- figurable Cognitive Computation Matrix for Sensor Edge Applications,

S.-Y . Peng, I.-C. Liu, Y .-H. Wu, T.-J. Lin, C.-J. Chen, X.-Z. Li, Y .-Q. Cheng, P.-H. Lin, K.-H. Hung, and Y . Tsao, “An SRAM-Based Recon- figurable Cognitive Computation Matrix for Sensor Edge Applications,” IEEE Journal of Solid-State Circuits , vol. 59, no. 2, pp. 636–648, 2024

work page 2024

[4] [4]

C3SRAM: An In-Memory- Computing SRAM Macro Based on Robust Capacitive Coupling Com- puting Mechanism,

Z. Jiang, S. Yin, J.-S. Seo, and M. Seok, “C3SRAM: An In-Memory- Computing SRAM Macro Based on Robust Capacitive Coupling Com- puting Mechanism,” IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1888–1897, 2020

work page 2020

[5] [5]

A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,

W.-H. Chen, K.-X. Li, W.-Y . Lin, K.-H. Hsu, P.-Y . Li, C.-H. Yang, C.-X. Xue, E.-Y . Yang, Y .-K. Chen, Y .-S. Chang, T.-H. Hsu, Y .-C. King, C.-J. Lin, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, and M.-F. Chang, “A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors,” in 2018 IEEE Interna...

work page 2018

[6] [6]

A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute,

H. Valavi, P. J. Ramadge, E. Nestler, and N. Verma, “A 64-Tile 2.4-Mb In-Memory-Computing CNN Accelerator Employing Charge-Domain Compute,” IEEE Journal of Solid-State Circuits , vol. 54, no. 6, pp. 1789–1799, 2019

work page 2019

[7] [7]

An RRAM-Based Digital Computing-in-Memory Macro With Dynamic V oltage Sense Amplifier and Sparse-Aware Ap- proximate Adder Tree,

Y . He, J. Yue, X. Feng, Y . Huang, H. Jia, J. Wang, L. Zhang, W. Sun, H. Yang, and Y . Liu, “An RRAM-Based Digital Computing-in-Memory Macro With Dynamic V oltage Sense Amplifier and Sparse-Aware Ap- proximate Adder Tree,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 2, pp. 416–420, 2023. 11

work page 2023

[8] [8]

8-b Precision 8-Mb ReRAM Compute-in-Memory Macro Using Direct-Current-Free Time-Domain Readout Scheme for AI Edge Devices,

J.-M. Hung, T.-H. Wen, Y .-H. Huang, S.-P. Huang, F.-C. Chang, C.-I. Su, W.-S. Khwa, C.-C. Lo, R.-S. Liu, C.-C. Hsieh, K.-T. Tang, Y .-D. Chih, T.-Y . J. Chang, and M.-F. Chang, “8-b Precision 8-Mb ReRAM Compute-in-Memory Macro Using Direct-Current-Free Time-Domain Readout Scheme for AI Edge Devices,” IEEE Journal of Solid-State Circuits, vol. 58, no. 1, ...

work page 2023

[9] [9]

CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference,

M. Giordano, K. Prabhu, K. Koul, R. M. Radway, A. Gural, R. Doshi, Z. F. Khan, J. W. Kustin, T. Liu, G. B. Lopes, V . Turbiner, W.-S. Khwa, Y .-D. Chih, M.-F. Chang, G. Lallement, B. Murmann, S. Mitra, and P. Raina, “CHIMERA: A 0.92 TOPS, 2.2 TOPS/W Edge AI Accelerator with 2 MByte On-Chip Foundry Resistive RAM for Efficient Training and Inference,” in 20...

work page 2021

[10] [10]

A 40-nm 118.44-TOPS/W V oltage-Sensing Compute-in-Memory RRAM Macro With Write Verification and Multi- Bit Encoding,

J.-H. Yoon, M. Chang, W.-S. Khwa, Y .-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40-nm 118.44-TOPS/W V oltage-Sensing Compute-in-Memory RRAM Macro With Write Verification and Multi- Bit Encoding,” IEEE Journal of Solid-State Circuits , vol. 57, no. 3, pp. 845–857, 2022

work page 2022

[11] [11]

A 22nm 32Mb Embedded STT-MRAM Macro Achieving 5.9ns Random Read Access and 5.8MB/s Write Throughput at up to Tj of 150 °C,

T. Shimoi, K. Matsubara, T. Saito, T. Ogawa, Y . Taito, Y . Kaneda, M. Izuna, K. Takeda, H. Mitani, T. Ito, and T. Kono, “A 22nm 32Mb Embedded STT-MRAM Macro Achieving 5.9ns Random Read Access and 5.8MB/s Write Throughput at up to Tj of 150 °C,” in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 134–135

work page 2022

[12] [12]

Analog In- memory Computing in FeFET-based 1T1R Array for Edge AI Applica- tions,

D. Saito, T. Kobayashi, H. Koga, N. Ronchi, K. Banerjee, Y . Shuto, J. Okuno, K. Konishi, L. Di Piazza, A. Mallik, J. Van Houdt, M. Tsukamoto, K. Ohkuri, T. Umebayashi, and T. Ezaki, “Analog In- memory Computing in FeFET-based 1T1R Array for Edge AI Applica- tions,” in 2021 Symposium on VLSI Circuits , 2021, pp. 1–2

work page 2021

[13] [13]

A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference,

Le Gallo, et al., “A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference,” vol. 6, no. 9, pp. 680–693, 2023

work page 2023

[14] [14]

An Analytical Method to Determine Min- imum Per-Layer Precision of Deep Neural Networks,

C. Sakr and N. Shanbhag, “An Analytical Method to Determine Min- imum Per-Layer Precision of Deep Neural Networks,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 1090–1094

work page 2018

[15] [15]

Benchmarking In-Memory Computing Architectures,

N. R. Shanbhag and S. K. Roy, “Benchmarking In-Memory Computing Architectures,” IEEE Open Journal of the Solid-State Circuits Society , vol. 2, pp. 288–300, 2022

work page 2022

[16] [16]

Energy-accuracy trade-offs for resistive in-memory computing architectures,

S. K. Roy and N. R. Shanbhag, “Energy-accuracy trade-offs for resistive in-memory computing architectures,” IEEE Journal on Exploratory Solid-State Computational Devices and Circuits , vol. 10, pp. 22–30, 2024

work page 2024

[17] [17]

A 7-nm Compute-in- Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS,

M. E. Sinangil, B. Erbagci, R. Naous, K. Akarvardar, D. Sun, W.- S. Khwa, H.-J. Liao, Y . Wang, and J. Chang, “A 7-nm Compute-in- Memory SRAM Macro Supporting Multi-Bit Input, Weight and Output and Achieving 351 TOPS/W and 372.4 GOPS,” IEEE Journal of Solid- State Circuits, vol. 56, no. 1, pp. 188–198, 2021

work page 2021

[18] [18]

A 44.2-TOPS/W CNN Processor With Variation-Tolerant Analog Datapath and Variation Compensating Circuit,

J.-O. Seo, M. Seok, and S. Cho, “A 44.2-TOPS/W CNN Processor With Variation-Tolerant Analog Datapath and Variation Compensating Circuit,” IEEE Journal of Solid-State Circuits , vol. 59, no. 5, pp. 1603– 1611, 2024

work page 2024

[19] [19]

An 8-bit 20.7 TOPS/W Multi- Level Cell ReRAM-based Compute Engine,

J. M. Correll, L. Jie, S. Song, S. Lee, J. Zhu, W. Tang, L. Wormald, J. Er- hardt, N. Breil, R. Quon, D. Kamalanathan, S. Krishnan, M. Chudzik, Z. Zhang, W. D. Lu, and M. P. Flynn, “An 8-bit 20.7 TOPS/W Multi- Level Cell ReRAM-based Compute Engine,” in 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), 2022, pp. 264–265

work page 2022

[20] [20]

Testing and reliability of computing-in memories: Solutions and challenges,

J.-F. Li, “Testing and reliability of computing-in memories: Solutions and challenges,” in 2022 IEEE International Test Conference in Asia (ITC-Asia), 2022, pp. 55–60

work page 2022

[21] [21]

On the reliability of computing- in-memory accelerators for deep neural networks,

Z. Yan, X. S. Hu, and Y . Shi, “On the reliability of computing- in-memory accelerators for deep neural networks,” 2022. [Online]. Available: https://arxiv.org/abs/2205.13018

work page arXiv 2022

[22] [22]

An MDAC synapse for analog neural networks,

R. Kier, R. Harrison, and R. Beer, “An MDAC synapse for analog neural networks,” in 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512) , vol. 5, 2004

work page 2004

[23] [23]

A low power bulk-driven MDAC synapse,

R. Laajimi, B. Hamdi, and N. Ayari, “A low power bulk-driven MDAC synapse,” in 2011 International Conference on Applied Electronics , 2011, pp. 1–5

work page 2011

[24] [24]

Hardware Neural Network using Hybrid Synapses via Transfer Learning: WOx Nano-Resistors and TiOx RRAM Synapse for Energy-Efficient Edge-AI Sensor,

W. Choi, M. Kwak, S. Heo, K. Lee, S. Lee, and H. Hwang, “Hardware Neural Network using Hybrid Synapses via Transfer Learning: WOx Nano-Resistors and TiOx RRAM Synapse for Energy-Efficient Edge-AI Sensor,” in 2021 IEEE International Electron Devices Meeting (IEDM) , 2021, pp. 23.1.1–23.1.4

work page 2021

[25] [25]

Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,

C.-J. Jhang, C.-X. Xue, J.-M. Hung, F.-C. Chang, and M.-F. Chang, “Challenges and Trends of SRAM-Based Computing-In-Memory for AI Edge Devices,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 68, no. 5, pp. 1773–1786, 2021

work page 2021

[26] [26]

Comprehending In-memory Computing Trends via Proper Benchmarking,

N. R. Shanbhag and S. K. Roy, “Comprehending In-memory Computing Trends via Proper Benchmarking,” in 2022 IEEE Custom Integrated Circuits Conference (CICC) , 2022, pp. 01–07

work page 2022

[27] [27]

EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High- Density Digital Storage,

A. Levy, L. R. Upton, M. D. Scott, D. Rich, W.-S. Khwa, Y .-D. Chih, M.-F. Chang, S. Mitra, B. Murmann, and P. Raina, “EMBER: Efficient Multiple-Bits-Per-Cell Embedded RRAM Macro for High- Density Digital Storage,” IEEE Journal of Solid-State Circuits , vol. 59, no. 7, pp. 2081–2092, 2024

work page 2081

[28] [28]

A 40-nm Compute-in-Memory Macro With RRAM Addressing IR Drop and Off- State Current,

S. D. Spetalnick, M. Chang, S. Konno, B. Crafton, A. S. Lele, W.- S. Khwa, Y .-D. Chih, M.-F. Chang, and A. Raychowdhury, “A 40-nm Compute-in-Memory Macro With RRAM Addressing IR Drop and Off- State Current,” IEEE Solid-State Circuits Letters , vol. 7, pp. 10–13, 2024

work page 2024

[29] [29]

Trends and challenges in the circuit and macro of RRAM-based computing-in- memory systems,

S.-T. Wei, B. Gao, D. Wu, J.-S. Tang, H. Qian, and H.-Q. Wu, “Trends and challenges in the circuit and macro of RRAM-based computing-in- memory systems,” Chip, vol. 1, no. 1, p. 100004, 2022

work page 2022

[30] [30]

DIANA: An End-to-End Energy-Efficient Digital and ANAlog Hybrid Neural Network SoC,

K. Ueyoshi, I. A. Papistas, P. Houshmand, G. M. Sarda, V . Jain, M. Shi, Q. Zheng, S. Giraldo, P. Vrancx, J. Doevenspeck, D. Bhattacharjee, S. Cosemans, A. Mallik, P. Debacker, D. Verkest, and M. Verhelst, “DIANA: An End-to-End Energy-Efficient Digital and ANAlog Hybrid Neural Network SoC,” in 2022 IEEE International Solid- State Circuits Conference (ISSC...

work page 2022

[31] [31]

RxNN: A Frame- work for Evaluating Deep Neural Networks on Resistive Crossbars,

S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “RxNN: A Frame- work for Evaluating Deep Neural Networks on Resistive Crossbars,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 2, pp. 326–338, 2021

work page 2021

[32] [32]

[Online]

A-core, an open-source risc-v processor implementation gitlab repository. [Online]. Available: https://gitlab.com/a-core

work page

[33] [33]

[Online]

Chisel, software-defined hardware. [Online]. Available: https://www. chisel-lang.org/

work page

[34] [34]

Non-V olatile RRAM Embedded into 22FFL FinFET Technology,

O. Golonzka et al., “Non-V olatile RRAM Embedded into 22FFL FinFET Technology,” in 2019 Symposium on VLSI Technology, 2019, pp. T230– T231

work page 2019

[35] [35]

The mnist database of handwritten digit images for machine learning research,

L. Deng, “The mnist database of handwritten digit images for machine learning research,” IEEE Signal Processing Magazine , vol. 29, no. 6, pp. 141–142, 2012. Omar Numan (Student Member, IEEE) received his M.Sc. degree in Micro- and Nano-electronic Circuit Design from Aalto University, Finland, in 2020, and is currently pursuing a Ph.D. at the same institu...

work page 2012

[36] [36]

His research is in analog circuit design for AI accelerators with focus on Sample and Hold circuits, analog memories, and noise analysis

He is currently pursuing his doctoral studies at the Department of Electronics and Nanoengineering, Aalto University, Finland. His research is in analog circuit design for AI accelerators with focus on Sample and Hold circuits, analog memories, and noise analysis. Jelin Leslin (Student Member, IEEE) is a doc- toral candidate at Aalto University specializi...

work page 2020

[37] [37]

degree at the institution

Currently, he is pursuing he’s Ph.D. degree at the institution. His research topic is program- matic circuit design especially relating to processor hardware acceleration and digital signal processing applications. Marko Kosunen (S’97´M”07) received his M.Sc, L.Sc and D.Sc (with honors) degrees from Helsinki University of Technology, Espoo, Finland, in 19...

work page 1998