Benchmarking Physical Performance of Neural Inference Circuits

Dmitri E. Nikonov; Ian A. Young

arxiv: 1907.05748 · v1 · pith:UY46OYWKnew · submitted 2019-07-12 · 💻 cs.ET · physics.app-ph

Benchmarking Physical Performance of Neural Inference Circuits

Dmitri E. Nikonov , Ian A. Young This is my paper

Pith reviewed 2026-05-24 22:18 UTC · model grok-4.3

classification 💻 cs.ET physics.app-ph

keywords neural networksbenchmarkingCMOSbeyond-CMOSphysical performanceinference circuitsarea energy timeneuromorphic

0 comments

The pith

A consistent benchmarking methodology estimates area, time, and energy for neural inference circuits across architectures and devices to identify promising combinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to compare physical performance of artificial, cellular, spiking, and oscillator neural networks built with both CMOS and beyond-CMOS devices such as spintronic, ferroelectric, and resistive memory. It applies one uniform estimation approach to metrics of area, time, and energy across several application cases. This produces a side-by-side ranking that points to which architecture-device pairings perform best under realistic hardware constraints. Readers would care because hardware limits directly determine whether large-scale neural inference can run efficiently in practice.

Core claim

By proposing and applying a consistent and transparent methodology, the work benchmarks physical performance metrics for multiple neural network types implemented in CMOS and beyond-CMOS technologies, then identifies the architecture and device combinations that deliver the strongest results for inference tasks.

What carries the argument

The consistent and transparent benchmarking methodology that combines device parameters drawn from literature into circuit-level estimates of area, time, and energy.

If this is right

Beyond-CMOS devices can improve one or more of the three metrics relative to CMOS for selected neural architectures.
Different neural network types exhibit distinct trade-offs in area, time, and energy that depend on the underlying device technology.
The methodology supplies a common basis for comparing future device proposals without requiring immediate full-chip redesigns.
Promising combinations can be prioritized for further circuit-level development and application mapping.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same estimation approach could be reapplied when new device parameters become available to update the ranking without re-deriving the entire framework.
Results may inform which neural architectures are worth mapping onto emerging hardware platforms for edge inference.
If the estimates hold, hardware designers gain a way to narrow experimental focus to a smaller set of device-architecture pairs.

Load-bearing premise

That device parameters taken from published literature can be assembled into accurate, consistent circuit estimates without needing full custom simulations for every case.

What would settle it

Fabrication or detailed custom simulation of one or more of the benchmarked circuits that produces performance rankings different from those predicted by the literature-based estimates.

read the original abstract

Numerous neural network circuits and architectures are presently under active research for application to artificial intelligence and machine learning. Their physical performance metrics (area, time, energy) are estimated. Various types of neural networks (artificial, cellular, spiking, and oscillator) are implemented with multiple CMOS and beyond-CMOS (spintronic, ferroelectric, resistive memory) devices. A consistent and transparent methodology is proposed and used to benchmark this comprehensive set of options across several application cases. Promising architecture/device combinations are identified.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to estimate physical performance metrics (area, time, energy) for artificial, cellular, spiking, and oscillator neural networks implemented in CMOS and beyond-CMOS devices (spintronic, ferroelectric, resistive memory). It proposes and applies a consistent, transparent methodology to benchmark these options across application cases and identifies promising architecture/device combinations for AI/ML inference.

Significance. If the methodology is transparent and the literature-derived parameters are properly normalized, the work could serve as a useful reference for comparing hardware options for neural inference. The comprehensive scope across four network types and multiple device classes is a positive feature; the attempt to apply one methodology to all cases is noted as a strength.

major comments (2)

[Methodology] Methodology section: the central claim that a single transparent methodology produces fair rankings requires explicit documentation of cross-normalization for device parameters (on-resistance, switching energy, area, latency) drawn from disparate literature sources. Without shown steps for scaling to common Vdd, temperature, endurance, or failure-probability conditions, the area-time-energy products for oscillator vs. spiking vs. artificial networks rest on an untested commensurability assumption that directly affects which pairs are declared promising.
[Results] Results tables (application cases): the identification of promising combinations rests on unshown calculations; the manuscript must include the full derivation, error analysis, and data-source citations for each metric estimate, as the abstract provides none and the soundness of the rankings cannot be assessed without them.

minor comments (2)

[Methodology] Notation for performance metrics (e.g., how area-time-energy product is defined) should be stated once in a dedicated subsection rather than repeated inline.
[Figures] Figure captions for benchmark plots should explicitly list the exact device parameters and literature references used for each bar.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and the recommendation for major revision. We agree that greater explicitness on normalization and derivations will strengthen the paper and address the concerns below by expanding the methodology and results sections.

read point-by-point responses

Referee: [Methodology] Methodology section: the central claim that a single transparent methodology produces fair rankings requires explicit documentation of cross-normalization for device parameters (on-resistance, switching energy, area, latency) drawn from disparate literature sources. Without shown steps for scaling to common Vdd, temperature, endurance, or failure-probability conditions, the area-time-energy products for oscillator vs. spiking vs. artificial networks rest on an untested commensurability assumption that directly affects which pairs are declared promising.

Authors: We accept this point. Although the methodology section outlines the consistent framework and cites the original device parameters, the explicit cross-normalization steps (scaling to common Vdd, temperature, endurance, and failure probability) are not presented in sufficient detail. We will revise by adding a new subsection (or appendix) that documents the normalization procedures, scaling factors, assumptions, and any sensitivity analysis performed. This will directly support the commensurability claim. revision: yes
Referee: [Results] Results tables (application cases): the identification of promising combinations rests on unshown calculations; the manuscript must include the full derivation, error analysis, and data-source citations for each metric estimate, as the abstract provides none and the soundness of the rankings cannot be assessed without them.

Authors: We agree that the current presentation summarizes the final metrics and cites sources in the text/tables but does not provide exhaustive per-estimate derivations or error analysis. We will revise by expanding the supplementary material (or adding an appendix) with full calculation traces, error propagation details, and consolidated source citations for every entry in the application-case tables. This will enable independent verification of the rankings. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmarking relies on external literature parameters and proposed methodology without self-referential reductions.

full rationale

The paper proposes and applies a consistent methodology to estimate area/time/energy metrics for multiple neural network types (artificial, cellular, spiking, oscillator) using device parameters drawn from external literature sources for CMOS and beyond-CMOS technologies. No equations, predictions, or central claims reduce by construction to fitted inputs or self-citations; the derivation chain treats literature values as independent inputs and produces comparative rankings as output. This matches the default expectation of a non-circular benchmarking study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; all such elements would require the full text.

pith-pipeline@v0.9.0 · 5599 in / 877 out tokens · 16723 ms · 2026-05-24T22:18:19.805347+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

95 extracted references · 95 canonical work pages · 5 internal anchors

[1]

maximum accuracy

Introduction The unprecedented progress of traditional, Boolean computing over the last five decades has been propelled by the scaling of the transistor scaling according to Moore’s law [1]. Recently a larger share of computing is being consumed by applications related to artificial intelligence (AI) and machine learning (ML). For these, Boolean computing...

work page
[2]

The elements at the input, synapses, receive vectors of input signals, xi, and multiply them by vectors of weights, wi

Fundamentals and Concepts of Neuromorphic Computing Operation in the majority of neural network architectures relies on a neural gate, often called the perceptron, Figure 1. The elements at the input, synapses, receive vectors of input signals, xi, and multiply them by vectors of weights, wi. Neurons perform the summation of these products and apply a non...

work page
[3]

 Ferroelectric FET (FEFET) devices

Types of Neuromorphic Devices Synapses and neurons can be implemented by a variety of devices (Table 2):  Digital CMOS and analog CMOS or tunnel FET (TFET) devices.  Ferroelectric FET (FEFET) devices.  Spintronic devices [27] of five types: in-plane and perpendicular spin transfer torque (STT) switches with perpendicular magnetic anisotropy, spin orbit...

work page
[4]

ANN”,”CNN

Types of Neural Networks We classify neural networks into 4 types according to the nature of signals used, Figure 5. a) Artificial neural network (ANN) where outputs switch in response to inputs in a mostly monotonic fashion. b) Cellular neural network (CeNN) differ from ANN by their rectangular grid geometry and high connectivity. Here they are treated i...

work page
[5]

We refer to it as ‘bottoms-up benchmarking’

Treatment of interconnects The benchmarks for neural network elements, neural gates, and larger DNNs are built up hierarchically, from benchmarks for a synapse and a neuron obtained in the previous section. We refer to it as ‘bottoms-up benchmarking’. The chip comprises a number of neural cores with multiple neurons in each and multiple synapses feeding s...

work page
[6]

multiply- and-accumulate

Chip-level benchmarks The operation of the chip involves signals coming from input neurons, processed in synapses, and then firing of output neurons. A synaptic operation (synaptic event) is understood in non- spiking networks as an operation of multiplication of an input signal by a weight, i.e., “multiply- and-accumulate” (MAC). However in spiking netwo...

work page
[7]

hardware We considered examples of neuromorphic workloads, including

Neuromorphic computing workloads vs. hardware We considered examples of neuromorphic workloads, including

work page
[8]

CoNN such as LeNet [45,46], shown in Figure 9

work page
[9]

a single stage convolution of a 35x35 pixel image with 24 filters of 5x5 pixels

work page
[10]

a single stage associative memory of pixel patterns [22]

work page
[11]

a DNN for recognition of hand-written digits from the MNIST hand-written-digit image database [48] implemented as a multi-layer perceptron (MLP) with 784×256×128×10 fully-connected neurons in layers

work page
[12]

compute-in-memory

a DNN for speech recognition from [18] – a 4 layer MLP with 390x256x256x29 neurons. 22 While all of these networks belong to the class of non-recurrent DNN, these are examples of ubiquitous applications required by users. However these workloads may not be favorable to SNNs. For example they do not utilize temporal information carried by spikes. The reade...

work page
[13]

In this chapter we consider mostly spiking neuromorphic chips [5,6,49]

Prototype neuromorphic chips We will compare the above benchmarks with those for prototype chips fabricated and measured by several groups of researchers. In this chapter we consider mostly spiking neuromorphic chips [5,6,49]. To them we apply ‘tops-down benchmarking’, i.e. calculate the neuron and synapse values from the total number of synapses, the tot...

work page 2010
[14]

They are based on traditional digital chips and in this sense are different from other neuromorphic hardware

Digital Neural Accelerators There is another type of chip being fabricated, which are commonly called neural accelerators [3]. They are based on traditional digital chips and in this sense are different from other neuromorphic hardware. Unlike CPU and GPU chips which implement neural network algorithms in software, neural accelerators have dedicated hardw...

work page 2014
[15]

Such energy-delay plots are provided both for synapses (Figure 15) and for neurons (Figure 16)

Results for physical performance The most informative view with the benchmarks is provide by the comparison of operation delay and energy. Such energy-delay plots are provided both for synapses (Figure 15) and for neurons (Figure 16). In many subsequent benchmarks the following technology options are found to be placed in close proximity to each other: th...

work page
[16]

dissipated power (Figure 20)

Throughput and Dissipated Power Circuit performance can be represented as computing throughput plotted vs. dissipated power (Figure 20). One notices that spintronic networks have a higher per unit area throughput due to the small size of their implementation of neurons and synapses. However this higher throughput results in very high dissipated power. If ...

work page
[17]

ANN and ONN show higher speed of operation at comparable energy vs

Conclusions In summary, the developed methodology described in this paper enables quantifying the effect of devices and NN types on the performance, power, and area of NNs. ANN and ONN show higher speed of operation at comparable energy vs. CeNN and SNN. This translates into a larger inference throughput especially under the limitation of power dissipatio...

work page
[18]

Acknowledgements The authors gratefully acknowledge discussions and critique by Narayan Srinivasa, Mike Mayberry, Sasikanth Manipatruni, Greg Chen, Ram Krishnamurthy, Chenyun Pan, Azad Naeemi, Dan Hammerstrom, Mike Davies, Eugenio Culurciello, Dmitri Strukov, Kaushik Roy, and Wolfgang Porod. 39

work page
[19]

Figure 22

Supplementary Materials Remaining benchmarking plots are collected here in order to keep the main text concise. Figure 22. Delay vs. area for synapses. 40 Figure 23. Delay vs. area for synapses. 41 Figure 24. Delay vs. area for neurons. 42 Figure 25. Delay vs. area for neurons. 43 Figure 26. Energy vs. delay for synapses. 44 Figure 27. Energy vs. delay fo...

work page
[20]

Cramming more components onto integrated circuits

G. E. Moore, “Cramming more components onto integrated circuits”, Proceedings of IEEE 86, 82–85 (1998)

work page 1998
[21]

In-Datacenter Performance Analysis of a Tensor Processing UnitTM

N. P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing UnitTM”, Proceeding ISCA '17 Proceedings of the 44th Annual International Symposium on Computer Architecture, 1-12, Toronto, ON, Canada, June 24 - 28, 2017

work page 2017
[22]

Efficient Processing of Deep Neural Networks: A Tutorial and Survey,

V. Sze, Y. Chen, T. Yang and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017. 55

work page 2017
[23]

A million spiking-neuron integrated circuit with a scalable communication network and interface,

Merolla, P.A., J.V. Arthur, R. Alvarez-Icaza, A S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, M.D. Flickner, W.P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, 345(6...

work page 2014
[24]

Large-scale neuromorphic computing systems

S. Furber, “Large-scale neuromorphic computing systems”, J. Neural Eng. 13 (2016) 051001

work page 2016
[25]

Memory and Information Processing in Neuromorphic Systems,

G. Indiveri and S. Liu, "Memory and Information Processing in Neuromorphic Systems," in Proceedings of the IEEE, vol. 103, no. 8, pp. 1379-1397, Aug. 2015

work page 2015
[26]

Deep learning

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning”, Nature 521, 436 (2015)

work page 2015
[27]

Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking

D. E. Nikonov and I. A. Young, “Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking”, Proc. IEEE 101, 2498 - 2533 (2013)

work page 2013
[28]

Benchmarking of Beyond-CMOS Exploratory Devices for Logic Integrated Circuits

D. E. Nikonov and I. A. Young, “Benchmarking of Beyond-CMOS Exploratory Devices for Logic Integrated Circuits”, IEEE J. Explor. Comput. Devices and Circuits 1, 3-11 (2015)

work page 2015
[29]

Benchmarking of devices in the Nanoelectronics Research Initiative

D. E. Nikonov and I. A. Young, “Benchmarking of devices in the Nanoelectronics Research Initiative”. [Online]. Available: https://nanohub.org/tools/nribench/browser/trunk/src (2019)

work page 2019
[30]

Artificial neural networks in hardware: A survey of two decades of progress

J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two decades of progress”, Neurocomputing, v. 74, no. 1–3, pp. 239-255 (2010)

work page 2010
[31]

A Survey of Neuromorphic Computing and Neural Networks in Hardware

C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. Rose, and J. S. Plank, “A Survey of Neuromorphic Computing and Neural Networks in Hardware”, available online, arXiv:1705.06963v1

work page internal anchor Pith review Pith/arXiv arXiv
[32]

Q. Liu, G. Pineda-García, E. Stromatias, T. Serrano-Gotarredona, S. B. Furber, “Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation “, Frontiers in Neuroscience, v. 10, p. 469 (2016)

work page 2016
[33]

Neuro-inspired computing with emerging nonvolatile memorys,

S. Yu, "Neuro-inspired computing with emerging nonvolatile memorys," in Proceedings of the IEEE, vol. 106, no. 2, pp. 260-285, Feb. 2018

work page 2018
[34]

Achieving ideal accuracies in analog neuromorphic computing using periodic carry,

S. Agarwal et al., "Achieving ideal accuracies in analog neuromorphic computing using periodic carry," 2017 Symposium on VLSI Technology, Kyoto, 2017, pp. T174-T175

work page 2017
[35]

Evaluation of neural network architectures for embedded systems,

A. Canziani, E. Culurciello and A. Paszke, "Evaluation of neural network architectures for embedded systems," 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, 2017, pp. 1-4

work page 2017
[36]

An Analysis of Deep Neural Network Models for Practical Applications

A. Canziani, A. Paszke, and E. Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications”, available online https://arxiv.org/abs/1605.07678 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[37]

Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware

P. Blouw, X. Choo, E. Hunsberger, and C. Eliasmith, “Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware”, available online https://arxiv.org/abs/1812.01739 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[38]

NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures,

P. Chen, X. Peng and S. Yu, "NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures," 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2017, pp. 6.1.1-6.1.4. 56

work page 2017
[39]

A Method to Estimate the Energy Consumption of Deep Neural Networks,

T.-J. Yang, Y.-H. Chen, J. Emer, V. Sze, "A Method to Estimate the Energy Consumption of Deep Neural Networks," Asilomar Conference on Signals, Systems and Computers, Invited Paper, October 2017

work page 2017
[40]

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator,

M. J. Marinella, S. Agarwal, A. Hsia, I. Richter, R. Jacobs-Gedrim, J. Niroula, S. J. Plimpton, E. Ipek, and C. D. James, "Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 1, pp. 86-101, March 2018

work page 2018
[41]

Non-Boolean Computing Benchmarking for Beyond-CMOS Devices Based on Cellular Neural Network

C. Pan, A. Naeemi, “Non-Boolean Computing Benchmarking for Beyond-CMOS Devices Based on Cellular Neural Network”, IEEE J. Explor. Comput. Devices and Circuits (2016)

work page 2016
[42]

Performance/price estimates for cortex-scale hardware: A design space exploration

M. S. Zaveri and D. Hammerstrom, “Performance/price estimates for cortex-scale hardware: A design space exploration”, Neural Networks 24 (2011) 291–304

work page 2011
[43]

Finding a roadmap to achieve large neuromorphic hardware systems

J. Hasler and B. Marr, “Finding a roadmap to achieve large neuromorphic hardware systems”, Frontiers in Neuroscience, 7, 118 (2013)

work page 2013
[44]

Performance analysis and benchmarking of all-spin spiking neural networks (Special session paper),

A. Sengupta, A. Ankit and K. Roy, "Performance analysis and benchmarking of all-spin spiking neural networks (Special session paper)," 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 4557-4563

work page 2017
[45]

Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches,

Z. Du et al., "Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches," 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Waikiki, HI, 2015, pp. 494-507

work page 2015
[46]

Magnetic Tunnel Junction Based Long-Term Short-Term Stochastic Synapse for a Spiking Neural Network with On-Chip STDP Learning

G. Srinivasan, A. Sengupta, and K. Roy, “Magnetic Tunnel Junction Based Long-Term Short-Term Stochastic Synapse for a Spiking Neural Network with On-Chip STDP Learning”, Scientific Reports 6, 29545 (2016)

work page 2016
[47]

Hybrid Spintronic-CMOS Spiking Neural Network with On- Chip Learning: Devices, Circuits, and Systems

A. Sengupta, A. Banerjee, and K. Roy, “Hybrid Spintronic-CMOS Spiking Neural Network with On- Chip Learning: Devices, Circuits, and Systems”, Phys. Rev. Appl. 6, 064003 (2016)

work page 2016
[48]

NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro- Inspired Architectures in Online Learning,

P. Chen, X. Peng and S. Yu, "NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro- Inspired Architectures in Online Learning," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3067-3080, Dec. 2018

work page 2018
[49]

Analytical Models for Calculating Power and Performance of a CNN System

I. Palit, B. Sedighi, Q. Lou, M. Niemier, J. Nahas, X. S. Hu, “Analytical Models for Calculating Power and Performance of a CNN System”, unpublished

work page
[50]

Spintronic memristor through spin-torque- induced magnetization motion,

X. Wang, Y. Chen, H. Xi, H. Li, and D. Dimitrov, “Spintronic memristor through spin-torque- induced magnetization motion,” IEEE Electron Device Lett., vol. 30, no. 3, pp. 294–297, Mar. 2009

work page 2009
[51]

Benchmarking Inverse Rashba-Edelstein Magnetoelectric Devices for Neuromorphic Computing

A. W. Stephan, J. Hu, S. J. Koester, “Benchmarking Inverse Rashba-Edelstein Magnetoelectric Devices for Neuromorphic Computing”, available online https://arxiv.org/abs/1811.08624 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[52]

Spintronic Nanodevices for Bioinspired Computing,

J. Grollier, D. Querlioz and M. D. Stiles, "Spintronic Nanodevices for Bioinspired Computing," in Proceedings of the IEEE, vol. 104, no. 10, pp. 2024-2039, Oct. 2016

work page 2024
[53]

Ferroelectric FET analog synapse for acceleration of deep neural network training,

M. Jerry et al., "Ferroelectric FET analog synapse for acceleration of deep neural network training," 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2017, pp. 6.2.1-6.2.4. 57

work page 2017
[54]

Partial switching of ferroelectrics for synaptic weight storage,

E. W. Kinder, C. Alessandri, P. Pandey, G. Karbasian, S. Salahuddin and A. Seabaugh, "Partial switching of ferroelectrics for synaptic weight storage," 2017 75th Annual Device Research Conference (DRC), South Bend, IN, 2017, pp. 1-2

work page 2017
[55]

Memristor Crossbar-Based Neuromorphic Computing System: A Case Study,

M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose and R. W. Linderman, "Memristor Crossbar-Based Neuromorphic Computing System: A Case Study," in IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 10, pp. 1864-1878, Oct. 2014

work page 2014
[56]

A spiking neuromorphic design with resistive crossbar,

C. Liu, B. Yan, C. Yang, L. Song, Z. Li, B. Liu, Y. Chen, H. Li, Q. Wu, H. Jiang, "A spiking neuromorphic design with resistive crossbar," 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, 2015, pp. 1-6

work page 2015
[57]

High- Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays,

F. Merrikh-Bayat, X. Guo, M. Klachko, M. Prezioso, K. K. Likharev and D. B. Strukov, "High- Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays," in IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp. 4782-4790, Oct. 2018

work page 2018
[58]

Energy-Efficient Time-Domain Vector-by- Matrix Multiplier for Neurocomputing and Beyond,

M. Bavandpour, M. R. Mahmoodi and D. B. Strukov, "Energy-Efficient Time-Domain Vector-by- Matrix Multiplier for Neurocomputing and Beyond," in IEEE Transactions on Circuits and Systems II: Express Briefs. (2019)

work page 2019
[59]

Spin-transfer torque magnetic memory as a stochastic memristive synapse

Vincent, A.F., Larroque, J., Zhao, W.S., Romdhane, N.B., Bichler, O., Gamrat, C., Klein, J.O., Galdin-Retailleau, S. and Querlioz, D., “Spin-transfer torque magnetic memory as a stochastic memristive synapse”. In 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1074-1077 (2014)

work page 2014
[60]

SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing

Ramasubramanian, S.G., Venkatesan, R., Sharad, M., Roy, K. and Raghunathan, A., “SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing”, In Proceedings of the 2014 international symposium on Low power electronics and design, pp. 15-20 (2014)

work page 2014
[61]

A Mixed Signal Architecture for Convolutional Neural Networks

Q. Lou, C. Pan, J. McGuinness, A. Horvath, A. Naeemi, M. Niemier, and X. S. Hu, “A Mixed Signal Architecture for Convolutional Neural Networks”, ACM Journal on Emerging Technologies in Computing Systems (JETC), v. 15, no. 2, art. 19, April 2019

work page 2019
[62]

Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

C. Lee, S. Shakib Sarwar, and K. Roy, “Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures”, available online https://arxiv.org/abs/1903.06379 (2019)

work page arXiv 1903
[63]

Power-efficient simulation of detailed cortical microcircuits on SpiNNaker

Sharp, T., Galluppi, F., Rast, A., and Furber, S., “Power-efficient simulation of detailed cortical microcircuits on SpiNNaker”, J. Neurosci. Methods 210, 110–118 (2012)

work page 2012
[64]

Handwritten digit recognition: Applications of neural network chips and automatic learning,

Y. LeCun, et al., “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Commun. Mag., vol. 27, no. 11, pp. 41–46, Nov. 1989

work page 1989
[65]

Gradient-based learning applied to document recognition,

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 86, 2278-2324 (1998)

work page 1998
[66]

Imagenet classification with deep convolutional neural networks

A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks”. In Advances in Neural Information Processing Systems 25, pp. 1097-1105 (2012)

work page 2012
[67]

Gradient-based learning applied to document recognition

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998. 58

work page 1998
[68]

Large-scale neuromorphic spiking array processors: A quest to mimic the brain

Thakur, C.S.T., Molin, J., Cauwenberghs, G., Indiveri, G., Kumar, K., Qiao, N., Schemmel, J., Wang, R.M., Chicca, E., Olson Hasler, J. and Seo, J.S., “Large-scale neuromorphic spiking array processors: A quest to mimic the brain”, Frontiers in neuroscience, 12, p.891 (2018)

work page 2018
[69]

Exploiting Inherent Error Resiliency of Deep Neural Networks to Achieve Extreme Energy Efficiency Through Mixed-Signal Neurons,

B. Chatterjee, P. Panda, S. Maity, A. Biswas, K. Roy and S. Sen, "Exploiting Inherent Error Resiliency of Deep Neural Networks to Achieve Extreme Energy Efficiency Through Mixed-Signal Neurons," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2019)

work page 2019
[70]

A waferscale neuromorphic hardware system for large-scale neural modeling,

Schemmel, J., D. Bruderle, A. Grubl, M. Hock, K. Meier, and S. Millner, “A waferscale neuromorphic hardware system for large-scale neural modeling,” Proc. 2010 IEEE Int. Symp. Circuits and Systems (ISCAS), 1947–1950, 2010

work page 2010
[71]

An Accelerated LIF Neuronal Network Array for a Large Scale Mixed-Signal Neuromorphic Architecture

S. A. Aamir, Y. Stradmann, P. Müller, C. Pehle, A. Hartel, A. Grübl, J. Schemmel, K. Meier, “An Accelerated LIF Neuronal Network Array for a Large Scale Mixed-Signal Neuromorphic Architecture”, available online arXiv 1804.01906 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[72]

A scalable neural chip with synaptic electronics using CMOS integrated memristors

J. M. Cruz-Albrecht, T. Derosier and N. Srinivasa, “A scalable neural chip with synaptic electronics using CMOS integrated memristors”, Nanotechnology 24, 384011 (2013)

work page 2013
[73]

SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation,

E. Painkras et al., "SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation," in IEEE Journal of Solid-State Circuits, vol. 48, no. 8, pp. 1943-1953, Aug. 2013

work page 1943
[74]

Power analysis of large-scale, real-time neural networks on SpiNNaker,

E. Stromatias, F. Galluppi, C. Patterson and S. Furber, "Power analysis of large-scale, real-time neural networks on SpiNNaker," The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, 2013, pp. 1-8

work page 2013
[75]

A fixed point exponential function accelerator for a neuromorphic many-core system,

J. Partzsch, S. Hoppner, M. Eberlein, R. Schuffny, C. Mayr, D. R. Lester, and S. Furber, “A fixed point exponential function accelerator for a neuromorphic many-core system,” in 2017 IEEE International Symposium on Circuits and Systems (ISCAS), May 2017, pp. 1–4

work page 2017
[76]

Real-time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ∼100× Speedup in Time-to-Solution and ∼100,000× Reduction in Energy-to-Solution

A. Cassidy et al., “Real-time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ∼100× Speedup in Time-to-Solution and ∼100,000× Reduction in Energy-to-Solution”, Proc. of International Conference for High Performance Computing, Networking, Storage and Analysis, SC14 (2014)

work page 2014
[77]

Neurogrid: A mixed analog-digital multichip system for large-scale neural simulations,

Benjamin, B., P. Gao, E. McQuinn, S. Choudhary, A. Chandrasekaran, J. Bussat, R. Alvarez-Icaza, J. Arthur, P. Merolla, and K. Boahen, “Neurogrid: A mixed analog-digital multichip system for large-scale neural simulations,” Proc. IEEE, 102(5):699–716, 2014

work page 2014
[78]

65k-neuron 73-Mevents/s 22-pJ/event asynchronous micro-pipelined integrate-and-fire array transceiver,

Park, J., S. Ha, T. Yu, E. Neftci, and G. Cauwenberghs, “65k-neuron 73-Mevents/s 22-pJ/event asynchronous micro-pipelined integrate-and-fire array transceiver,” Proc. 2014 IEEE Biomedical Circuits and Systems Conf. (BioCAS), 2014

work page 2014
[79]

A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses

N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, D. Sumislawska, and G. Indiveri, “A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses”, Frontiers in Neuroscience, v. 9, 141 (2015)

work page 2015
[80]

Neuromorphic architectures for spiking deep neural networks,

G. Indiveri, F. Corradi and N. Qiao, "Neuromorphic architectures for spiking deep neural networks," 2015 IEEE International Electron Devices Meeting (IEDM), Washington, DC, 2015, pp. 4.2.1-4.2.4. 59

work page 2015

Showing first 80 references.

[1] [1]

maximum accuracy

Introduction The unprecedented progress of traditional, Boolean computing over the last five decades has been propelled by the scaling of the transistor scaling according to Moore’s law [1]. Recently a larger share of computing is being consumed by applications related to artificial intelligence (AI) and machine learning (ML). For these, Boolean computing...

work page

[2] [2]

The elements at the input, synapses, receive vectors of input signals, xi, and multiply them by vectors of weights, wi

Fundamentals and Concepts of Neuromorphic Computing Operation in the majority of neural network architectures relies on a neural gate, often called the perceptron, Figure 1. The elements at the input, synapses, receive vectors of input signals, xi, and multiply them by vectors of weights, wi. Neurons perform the summation of these products and apply a non...

work page

[3] [3]

 Ferroelectric FET (FEFET) devices

Types of Neuromorphic Devices Synapses and neurons can be implemented by a variety of devices (Table 2):  Digital CMOS and analog CMOS or tunnel FET (TFET) devices.  Ferroelectric FET (FEFET) devices.  Spintronic devices [27] of five types: in-plane and perpendicular spin transfer torque (STT) switches with perpendicular magnetic anisotropy, spin orbit...

work page

[4] [4]

ANN”,”CNN

Types of Neural Networks We classify neural networks into 4 types according to the nature of signals used, Figure 5. a) Artificial neural network (ANN) where outputs switch in response to inputs in a mostly monotonic fashion. b) Cellular neural network (CeNN) differ from ANN by their rectangular grid geometry and high connectivity. Here they are treated i...

work page

[5] [5]

We refer to it as ‘bottoms-up benchmarking’

Treatment of interconnects The benchmarks for neural network elements, neural gates, and larger DNNs are built up hierarchically, from benchmarks for a synapse and a neuron obtained in the previous section. We refer to it as ‘bottoms-up benchmarking’. The chip comprises a number of neural cores with multiple neurons in each and multiple synapses feeding s...

work page

[6] [6]

multiply- and-accumulate

Chip-level benchmarks The operation of the chip involves signals coming from input neurons, processed in synapses, and then firing of output neurons. A synaptic operation (synaptic event) is understood in non- spiking networks as an operation of multiplication of an input signal by a weight, i.e., “multiply- and-accumulate” (MAC). However in spiking netwo...

work page

[7] [7]

hardware We considered examples of neuromorphic workloads, including

Neuromorphic computing workloads vs. hardware We considered examples of neuromorphic workloads, including

work page

[8] [8]

CoNN such as LeNet [45,46], shown in Figure 9

work page

[9] [9]

a single stage convolution of a 35x35 pixel image with 24 filters of 5x5 pixels

work page

[10] [10]

a single stage associative memory of pixel patterns [22]

work page

[11] [11]

a DNN for recognition of hand-written digits from the MNIST hand-written-digit image database [48] implemented as a multi-layer perceptron (MLP) with 784×256×128×10 fully-connected neurons in layers

work page

[12] [12]

compute-in-memory

a DNN for speech recognition from [18] – a 4 layer MLP with 390x256x256x29 neurons. 22 While all of these networks belong to the class of non-recurrent DNN, these are examples of ubiquitous applications required by users. However these workloads may not be favorable to SNNs. For example they do not utilize temporal information carried by spikes. The reade...

work page

[13] [13]

In this chapter we consider mostly spiking neuromorphic chips [5,6,49]

Prototype neuromorphic chips We will compare the above benchmarks with those for prototype chips fabricated and measured by several groups of researchers. In this chapter we consider mostly spiking neuromorphic chips [5,6,49]. To them we apply ‘tops-down benchmarking’, i.e. calculate the neuron and synapse values from the total number of synapses, the tot...

work page 2010

[14] [14]

They are based on traditional digital chips and in this sense are different from other neuromorphic hardware

Digital Neural Accelerators There is another type of chip being fabricated, which are commonly called neural accelerators [3]. They are based on traditional digital chips and in this sense are different from other neuromorphic hardware. Unlike CPU and GPU chips which implement neural network algorithms in software, neural accelerators have dedicated hardw...

work page 2014

[15] [15]

Such energy-delay plots are provided both for synapses (Figure 15) and for neurons (Figure 16)

Results for physical performance The most informative view with the benchmarks is provide by the comparison of operation delay and energy. Such energy-delay plots are provided both for synapses (Figure 15) and for neurons (Figure 16). In many subsequent benchmarks the following technology options are found to be placed in close proximity to each other: th...

work page

[16] [16]

dissipated power (Figure 20)

Throughput and Dissipated Power Circuit performance can be represented as computing throughput plotted vs. dissipated power (Figure 20). One notices that spintronic networks have a higher per unit area throughput due to the small size of their implementation of neurons and synapses. However this higher throughput results in very high dissipated power. If ...

work page

[17] [17]

ANN and ONN show higher speed of operation at comparable energy vs

Conclusions In summary, the developed methodology described in this paper enables quantifying the effect of devices and NN types on the performance, power, and area of NNs. ANN and ONN show higher speed of operation at comparable energy vs. CeNN and SNN. This translates into a larger inference throughput especially under the limitation of power dissipatio...

work page

[18] [18]

Acknowledgements The authors gratefully acknowledge discussions and critique by Narayan Srinivasa, Mike Mayberry, Sasikanth Manipatruni, Greg Chen, Ram Krishnamurthy, Chenyun Pan, Azad Naeemi, Dan Hammerstrom, Mike Davies, Eugenio Culurciello, Dmitri Strukov, Kaushik Roy, and Wolfgang Porod. 39

work page

[19] [19]

Figure 22

Supplementary Materials Remaining benchmarking plots are collected here in order to keep the main text concise. Figure 22. Delay vs. area for synapses. 40 Figure 23. Delay vs. area for synapses. 41 Figure 24. Delay vs. area for neurons. 42 Figure 25. Delay vs. area for neurons. 43 Figure 26. Energy vs. delay for synapses. 44 Figure 27. Energy vs. delay fo...

work page

[20] [20]

Cramming more components onto integrated circuits

G. E. Moore, “Cramming more components onto integrated circuits”, Proceedings of IEEE 86, 82–85 (1998)

work page 1998

[21] [21]

In-Datacenter Performance Analysis of a Tensor Processing UnitTM

N. P. Jouppi et al., “In-Datacenter Performance Analysis of a Tensor Processing UnitTM”, Proceeding ISCA '17 Proceedings of the 44th Annual International Symposium on Computer Architecture, 1-12, Toronto, ON, Canada, June 24 - 28, 2017

work page 2017

[22] [22]

Efficient Processing of Deep Neural Networks: A Tutorial and Survey,

V. Sze, Y. Chen, T. Yang and J. S. Emer, "Efficient Processing of Deep Neural Networks: A Tutorial and Survey," in Proceedings of the IEEE, vol. 105, no. 12, pp. 2295-2329, Dec. 2017. 55

work page 2017

[23] [23]

A million spiking-neuron integrated circuit with a scalable communication network and interface,

Merolla, P.A., J.V. Arthur, R. Alvarez-Icaza, A S. Cassidy, J. Sawada, F. Akopyan, B.L. Jackson, N. Imam, C. Guo, Y. Nakamura, B. Brezzo, I. Vo, S.K. Esser, R. Appuswamy, B. Taba, A. Amir, M.D. Flickner, W.P. Risk, R. Manohar, and D. S. Modha, “A million spiking-neuron integrated circuit with a scalable communication network and interface,” Science, 345(6...

work page 2014

[24] [24]

Large-scale neuromorphic computing systems

S. Furber, “Large-scale neuromorphic computing systems”, J. Neural Eng. 13 (2016) 051001

work page 2016

[25] [25]

Memory and Information Processing in Neuromorphic Systems,

G. Indiveri and S. Liu, "Memory and Information Processing in Neuromorphic Systems," in Proceedings of the IEEE, vol. 103, no. 8, pp. 1379-1397, Aug. 2015

work page 2015

[26] [26]

Deep learning

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning”, Nature 521, 436 (2015)

work page 2015

[27] [27]

Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking

D. E. Nikonov and I. A. Young, “Overview of Beyond-CMOS Devices and a Uniform Methodology for Their Benchmarking”, Proc. IEEE 101, 2498 - 2533 (2013)

work page 2013

[28] [28]

Benchmarking of Beyond-CMOS Exploratory Devices for Logic Integrated Circuits

D. E. Nikonov and I. A. Young, “Benchmarking of Beyond-CMOS Exploratory Devices for Logic Integrated Circuits”, IEEE J. Explor. Comput. Devices and Circuits 1, 3-11 (2015)

work page 2015

[29] [29]

Benchmarking of devices in the Nanoelectronics Research Initiative

D. E. Nikonov and I. A. Young, “Benchmarking of devices in the Nanoelectronics Research Initiative”. [Online]. Available: https://nanohub.org/tools/nribench/browser/trunk/src (2019)

work page 2019

[30] [30]

Artificial neural networks in hardware: A survey of two decades of progress

J. Misra and I. Saha, “Artificial neural networks in hardware: A survey of two decades of progress”, Neurocomputing, v. 74, no. 1–3, pp. 239-255 (2010)

work page 2010

[31] [31]

A Survey of Neuromorphic Computing and Neural Networks in Hardware

C. D. Schuman, T. E. Potok, R. M. Patton, J. D. Birdwell, M. E. Dean, G. S. Rose, and J. S. Plank, “A Survey of Neuromorphic Computing and Neural Networks in Hardware”, available online, arXiv:1705.06963v1

work page internal anchor Pith review Pith/arXiv arXiv

[32] [32]

Q. Liu, G. Pineda-García, E. Stromatias, T. Serrano-Gotarredona, S. B. Furber, “Benchmarking Spike-Based Visual Recognition: A Dataset and Evaluation “, Frontiers in Neuroscience, v. 10, p. 469 (2016)

work page 2016

[33] [33]

Neuro-inspired computing with emerging nonvolatile memorys,

S. Yu, "Neuro-inspired computing with emerging nonvolatile memorys," in Proceedings of the IEEE, vol. 106, no. 2, pp. 260-285, Feb. 2018

work page 2018

[34] [34]

Achieving ideal accuracies in analog neuromorphic computing using periodic carry,

S. Agarwal et al., "Achieving ideal accuracies in analog neuromorphic computing using periodic carry," 2017 Symposium on VLSI Technology, Kyoto, 2017, pp. T174-T175

work page 2017

[35] [35]

Evaluation of neural network architectures for embedded systems,

A. Canziani, E. Culurciello and A. Paszke, "Evaluation of neural network architectures for embedded systems," 2017 IEEE International Symposium on Circuits and Systems (ISCAS), Baltimore, MD, 2017, pp. 1-4

work page 2017

[36] [36]

An Analysis of Deep Neural Network Models for Practical Applications

A. Canziani, A. Paszke, and E. Culurciello, “An Analysis of Deep Neural Network Models for Practical Applications”, available online https://arxiv.org/abs/1605.07678 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[37] [37]

Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware

P. Blouw, X. Choo, E. Hunsberger, and C. Eliasmith, “Benchmarking Keyword Spotting Efficiency on Neuromorphic Hardware”, available online https://arxiv.org/abs/1812.01739 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[38] [38]

NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures,

P. Chen, X. Peng and S. Yu, "NeuroSim+: An integrated device-to-algorithm framework for benchmarking synaptic devices and array architectures," 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2017, pp. 6.1.1-6.1.4. 56

work page 2017

[39] [39]

A Method to Estimate the Energy Consumption of Deep Neural Networks,

T.-J. Yang, Y.-H. Chen, J. Emer, V. Sze, "A Method to Estimate the Energy Consumption of Deep Neural Networks," Asilomar Conference on Signals, Systems and Computers, Invited Paper, October 2017

work page 2017

[40] [40]

Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator,

M. J. Marinella, S. Agarwal, A. Hsia, I. Richter, R. Jacobs-Gedrim, J. Niroula, S. J. Plimpton, E. Ipek, and C. D. James, "Multiscale Co-Design Analysis of Energy, Latency, Area, and Accuracy of a ReRAM Analog Neural Training Accelerator," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 8, no. 1, pp. 86-101, March 2018

work page 2018

[41] [41]

Non-Boolean Computing Benchmarking for Beyond-CMOS Devices Based on Cellular Neural Network

C. Pan, A. Naeemi, “Non-Boolean Computing Benchmarking for Beyond-CMOS Devices Based on Cellular Neural Network”, IEEE J. Explor. Comput. Devices and Circuits (2016)

work page 2016

[42] [42]

Performance/price estimates for cortex-scale hardware: A design space exploration

M. S. Zaveri and D. Hammerstrom, “Performance/price estimates for cortex-scale hardware: A design space exploration”, Neural Networks 24 (2011) 291–304

work page 2011

[43] [43]

Finding a roadmap to achieve large neuromorphic hardware systems

J. Hasler and B. Marr, “Finding a roadmap to achieve large neuromorphic hardware systems”, Frontiers in Neuroscience, 7, 118 (2013)

work page 2013

[44] [44]

Performance analysis and benchmarking of all-spin spiking neural networks (Special session paper),

A. Sengupta, A. Ankit and K. Roy, "Performance analysis and benchmarking of all-spin spiking neural networks (Special session paper)," 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 4557-4563

work page 2017

[45] [45]

Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches,

Z. Du et al., "Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches," 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Waikiki, HI, 2015, pp. 494-507

work page 2015

[46] [46]

Magnetic Tunnel Junction Based Long-Term Short-Term Stochastic Synapse for a Spiking Neural Network with On-Chip STDP Learning

G. Srinivasan, A. Sengupta, and K. Roy, “Magnetic Tunnel Junction Based Long-Term Short-Term Stochastic Synapse for a Spiking Neural Network with On-Chip STDP Learning”, Scientific Reports 6, 29545 (2016)

work page 2016

[47] [47]

Hybrid Spintronic-CMOS Spiking Neural Network with On- Chip Learning: Devices, Circuits, and Systems

A. Sengupta, A. Banerjee, and K. Roy, “Hybrid Spintronic-CMOS Spiking Neural Network with On- Chip Learning: Devices, Circuits, and Systems”, Phys. Rev. Appl. 6, 064003 (2016)

work page 2016

[48] [48]

NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro- Inspired Architectures in Online Learning,

P. Chen, X. Peng and S. Yu, "NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro- Inspired Architectures in Online Learning," in IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 37, no. 12, pp. 3067-3080, Dec. 2018

work page 2018

[49] [49]

Analytical Models for Calculating Power and Performance of a CNN System

I. Palit, B. Sedighi, Q. Lou, M. Niemier, J. Nahas, X. S. Hu, “Analytical Models for Calculating Power and Performance of a CNN System”, unpublished

work page

[50] [50]

Spintronic memristor through spin-torque- induced magnetization motion,

X. Wang, Y. Chen, H. Xi, H. Li, and D. Dimitrov, “Spintronic memristor through spin-torque- induced magnetization motion,” IEEE Electron Device Lett., vol. 30, no. 3, pp. 294–297, Mar. 2009

work page 2009

[51] [51]

Benchmarking Inverse Rashba-Edelstein Magnetoelectric Devices for Neuromorphic Computing

A. W. Stephan, J. Hu, S. J. Koester, “Benchmarking Inverse Rashba-Edelstein Magnetoelectric Devices for Neuromorphic Computing”, available online https://arxiv.org/abs/1811.08624 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[52] [52]

Spintronic Nanodevices for Bioinspired Computing,

J. Grollier, D. Querlioz and M. D. Stiles, "Spintronic Nanodevices for Bioinspired Computing," in Proceedings of the IEEE, vol. 104, no. 10, pp. 2024-2039, Oct. 2016

work page 2024

[53] [53]

Ferroelectric FET analog synapse for acceleration of deep neural network training,

M. Jerry et al., "Ferroelectric FET analog synapse for acceleration of deep neural network training," 2017 IEEE International Electron Devices Meeting (IEDM), San Francisco, CA, 2017, pp. 6.2.1-6.2.4. 57

work page 2017

[54] [54]

Partial switching of ferroelectrics for synaptic weight storage,

E. W. Kinder, C. Alessandri, P. Pandey, G. Karbasian, S. Salahuddin and A. Seabaugh, "Partial switching of ferroelectrics for synaptic weight storage," 2017 75th Annual Device Research Conference (DRC), South Bend, IN, 2017, pp. 1-2

work page 2017

[55] [55]

Memristor Crossbar-Based Neuromorphic Computing System: A Case Study,

M. Hu, H. Li, Y. Chen, Q. Wu, G. S. Rose and R. W. Linderman, "Memristor Crossbar-Based Neuromorphic Computing System: A Case Study," in IEEE Transactions on Neural Networks and Learning Systems, vol. 25, no. 10, pp. 1864-1878, Oct. 2014

work page 2014

[56] [56]

A spiking neuromorphic design with resistive crossbar,

C. Liu, B. Yan, C. Yang, L. Song, Z. Li, B. Liu, Y. Chen, H. Li, Q. Wu, H. Jiang, "A spiking neuromorphic design with resistive crossbar," 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, 2015, pp. 1-6

work page 2015

[57] [57]

High- Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays,

F. Merrikh-Bayat, X. Guo, M. Klachko, M. Prezioso, K. K. Likharev and D. B. Strukov, "High- Performance Mixed-Signal Neurocomputing With Nanoscale Floating-Gate Memory Cell Arrays," in IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 10, pp. 4782-4790, Oct. 2018

work page 2018

[58] [58]

Energy-Efficient Time-Domain Vector-by- Matrix Multiplier for Neurocomputing and Beyond,

M. Bavandpour, M. R. Mahmoodi and D. B. Strukov, "Energy-Efficient Time-Domain Vector-by- Matrix Multiplier for Neurocomputing and Beyond," in IEEE Transactions on Circuits and Systems II: Express Briefs. (2019)

work page 2019

[59] [59]

Spin-transfer torque magnetic memory as a stochastic memristive synapse

Vincent, A.F., Larroque, J., Zhao, W.S., Romdhane, N.B., Bichler, O., Gamrat, C., Klein, J.O., Galdin-Retailleau, S. and Querlioz, D., “Spin-transfer torque magnetic memory as a stochastic memristive synapse”. In 2014 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1074-1077 (2014)

work page 2014

[60] [60]

SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing

Ramasubramanian, S.G., Venkatesan, R., Sharad, M., Roy, K. and Raghunathan, A., “SPINDLE: SPINtronic deep learning engine for large-scale neuromorphic computing”, In Proceedings of the 2014 international symposium on Low power electronics and design, pp. 15-20 (2014)

work page 2014

[61] [61]

A Mixed Signal Architecture for Convolutional Neural Networks

Q. Lou, C. Pan, J. McGuinness, A. Horvath, A. Naeemi, M. Niemier, and X. S. Hu, “A Mixed Signal Architecture for Convolutional Neural Networks”, ACM Journal on Emerging Technologies in Computing Systems (JETC), v. 15, no. 2, art. 19, April 2019

work page 2019

[62] [62]

Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures

C. Lee, S. Shakib Sarwar, and K. Roy, “Enabling Spike-based Backpropagation in State-of-the-art Deep Neural Network Architectures”, available online https://arxiv.org/abs/1903.06379 (2019)

work page arXiv 1903

[63] [63]

Power-efficient simulation of detailed cortical microcircuits on SpiNNaker

Sharp, T., Galluppi, F., Rast, A., and Furber, S., “Power-efficient simulation of detailed cortical microcircuits on SpiNNaker”, J. Neurosci. Methods 210, 110–118 (2012)

work page 2012

[64] [64]

Handwritten digit recognition: Applications of neural network chips and automatic learning,

Y. LeCun, et al., “Handwritten digit recognition: Applications of neural network chips and automatic learning,” IEEE Commun. Mag., vol. 27, no. 11, pp. 41–46, Nov. 1989

work page 1989

[65] [65]

Gradient-based learning applied to document recognition,

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, 86, 2278-2324 (1998)

work page 1998

[66] [66]

Imagenet classification with deep convolutional neural networks

A. Krizhevsky, I. Sutskever, and G. Hinton. “Imagenet classification with deep convolutional neural networks”. In Advances in Neural Information Processing Systems 25, pp. 1097-1105 (2012)

work page 2012

[67] [67]

Gradient-based learning applied to document recognition

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278-2324, November 1998. 58

work page 1998

[68] [68]

Large-scale neuromorphic spiking array processors: A quest to mimic the brain

Thakur, C.S.T., Molin, J., Cauwenberghs, G., Indiveri, G., Kumar, K., Qiao, N., Schemmel, J., Wang, R.M., Chicca, E., Olson Hasler, J. and Seo, J.S., “Large-scale neuromorphic spiking array processors: A quest to mimic the brain”, Frontiers in neuroscience, 12, p.891 (2018)

work page 2018

[69] [69]

Exploiting Inherent Error Resiliency of Deep Neural Networks to Achieve Extreme Energy Efficiency Through Mixed-Signal Neurons,

B. Chatterjee, P. Panda, S. Maity, A. Biswas, K. Roy and S. Sen, "Exploiting Inherent Error Resiliency of Deep Neural Networks to Achieve Extreme Energy Efficiency Through Mixed-Signal Neurons," in IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2019)

work page 2019

[70] [70]

A waferscale neuromorphic hardware system for large-scale neural modeling,

Schemmel, J., D. Bruderle, A. Grubl, M. Hock, K. Meier, and S. Millner, “A waferscale neuromorphic hardware system for large-scale neural modeling,” Proc. 2010 IEEE Int. Symp. Circuits and Systems (ISCAS), 1947–1950, 2010

work page 2010

[71] [71]

An Accelerated LIF Neuronal Network Array for a Large Scale Mixed-Signal Neuromorphic Architecture

S. A. Aamir, Y. Stradmann, P. Müller, C. Pehle, A. Hartel, A. Grübl, J. Schemmel, K. Meier, “An Accelerated LIF Neuronal Network Array for a Large Scale Mixed-Signal Neuromorphic Architecture”, available online arXiv 1804.01906 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[72] [72]

A scalable neural chip with synaptic electronics using CMOS integrated memristors

J. M. Cruz-Albrecht, T. Derosier and N. Srinivasa, “A scalable neural chip with synaptic electronics using CMOS integrated memristors”, Nanotechnology 24, 384011 (2013)

work page 2013

[73] [73]

SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation,

E. Painkras et al., "SpiNNaker: A 1-W 18-Core System-on-Chip for Massively-Parallel Neural Network Simulation," in IEEE Journal of Solid-State Circuits, vol. 48, no. 8, pp. 1943-1953, Aug. 2013

work page 1943

[74] [74]

Power analysis of large-scale, real-time neural networks on SpiNNaker,

E. Stromatias, F. Galluppi, C. Patterson and S. Furber, "Power analysis of large-scale, real-time neural networks on SpiNNaker," The 2013 International Joint Conference on Neural Networks (IJCNN), Dallas, TX, 2013, pp. 1-8

work page 2013

[75] [75]

A fixed point exponential function accelerator for a neuromorphic many-core system,

J. Partzsch, S. Hoppner, M. Eberlein, R. Schuffny, C. Mayr, D. R. Lester, and S. Furber, “A fixed point exponential function accelerator for a neuromorphic many-core system,” in 2017 IEEE International Symposium on Circuits and Systems (ISCAS), May 2017, pp. 1–4

work page 2017

[76] [76]

Real-time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ∼100× Speedup in Time-to-Solution and ∼100,000× Reduction in Energy-to-Solution

A. Cassidy et al., “Real-time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ∼100× Speedup in Time-to-Solution and ∼100,000× Reduction in Energy-to-Solution”, Proc. of International Conference for High Performance Computing, Networking, Storage and Analysis, SC14 (2014)

work page 2014

[77] [77]

Neurogrid: A mixed analog-digital multichip system for large-scale neural simulations,

Benjamin, B., P. Gao, E. McQuinn, S. Choudhary, A. Chandrasekaran, J. Bussat, R. Alvarez-Icaza, J. Arthur, P. Merolla, and K. Boahen, “Neurogrid: A mixed analog-digital multichip system for large-scale neural simulations,” Proc. IEEE, 102(5):699–716, 2014

work page 2014

[78] [78]

65k-neuron 73-Mevents/s 22-pJ/event asynchronous micro-pipelined integrate-and-fire array transceiver,

Park, J., S. Ha, T. Yu, E. Neftci, and G. Cauwenberghs, “65k-neuron 73-Mevents/s 22-pJ/event asynchronous micro-pipelined integrate-and-fire array transceiver,” Proc. 2014 IEEE Biomedical Circuits and Systems Conf. (BioCAS), 2014

work page 2014

[79] [79]

A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses

N. Qiao, H. Mostafa, F. Corradi, M. Osswald, F. Stefanini, D. Sumislawska, and G. Indiveri, “A reconfigurable on-line learning spiking neuromorphic processor comprising 256 neurons and 128K synapses”, Frontiers in Neuroscience, v. 9, 141 (2015)

work page 2015

[80] [80]

Neuromorphic architectures for spiking deep neural networks,

G. Indiveri, F. Corradi and N. Qiao, "Neuromorphic architectures for spiking deep neural networks," 2015 IEEE International Electron Devices Meeting (IEDM), Washington, DC, 2015, pp. 4.2.1-4.2.4. 59

work page 2015