X-CHANGR: Changing Memristive Crossbar Mapping for Mitigating Line-Resistance Induced Accuracy Degradation in Deep Neural Networks

Amogh Agrawal; Chankyu Lee; Kaushik Roy

arxiv: 1907.00285 · v1 · pith:SMRKKKJDnew · submitted 2019-06-29 · 💻 cs.ET

X-CHANGR: Changing Memristive Crossbar Mapping for Mitigating Line-Resistance Induced Accuracy Degradation in Deep Neural Networks

Amogh Agrawal , Chankyu Lee , Kaushik Roy This is my paper

Pith reviewed 2026-05-25 12:22 UTC · model grok-4.3

classification 💻 cs.ET

keywords memristive crossbarsline resistanceDNN accuracy degradationremapping strategysensitivity analysisVGG16CIFAR10matrix-vector multiplication

0 comments

The pith

Remapping sensitive kernels closer to drivers in memristive crossbars cuts DNN accuracy loss from line resistance to 2.1-2.9%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that re-mapping the placement of kernels on memristive crossbar columns according to their sensitivity to line resistance can substantially reduce accuracy loss in deep neural networks. Line resistance causes voltage drops that affect columns farther from the drivers more severely, and by ranking kernels and placing sensitive ones closer, the overall impact is lessened. Two algorithms are presented: a static remapping strategy and a dynamic one that optimize this arrangement for a pre-trained network. On VGG16 with CIFAR10, these limit degradation to 2.9% and 2.1% versus 5.6% for standard mapping. Readers would care because this provides a post-training optimization for hardware accelerators without requiring retraining or changes to the learned weights.

Core claim

By performing sensitivity analysis on DNN weights and kernels and re-arranging crossbar columns so that the most sensitive kernels are placed closer to the drivers, the static remapping strategy (SRS) and dynamic remapping strategy (DRS) reduce the accuracy degradation due to line resistance from 5.6% to 2.9% and 2.1%, respectively, for a VGG16 network on CIFAR10 without retraining the weights.

What carries the argument

Sensitivity-ranked column remapping of kernels in resistive crossbars to minimize the effect of line-resistance induced voltage drops.

If this is right

SRS and DRS reduce accuracy degradation without retraining the network weights.
The remapping strategies can be combined with existing mitigation methods such as in-situ compensation.
Benefits are shown for VGG16 trained on CIFAR10, with SRS achieving 2.9% degradation and DRS achieving 2.1%.
Static remapping uses a fixed arrangement while dynamic remapping allows adjustments during operation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sensitivity ranking might be reused across similar network architectures if the dominant error sources remain line resistance.
Hardware implementations could incorporate the ranking step as a one-time preprocessing cost before deployment.
Extending the method to multi-layer mappings might require checking interactions between layers' column assignments.

Load-bearing premise

A sensitivity analysis performed on the pre-trained network accurately identifies which kernels are most affected by column voltage drops in the physical crossbar.

What would settle it

Fabricating a crossbar array, applying the same VGG16 weights with and without the proposed remapping, and measuring the resulting accuracy on CIFAR10 test images.

Figures

Figures reproduced from arXiv: 1907.00285 by Amogh Agrawal, Chankyu Lee, Kaushik Roy.

**Figure 2.** Figure 2: Illustration of kernels of a convolutional neural network being mapped to crossbar arrays. Each kernel is flattened and stored as a column vector in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 4.** Figure 4: (a) Scatter plot showing the output current from crossbar obtained from ˆ [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: (a) Variation of the fitting parameters m, c and σ, as a function of crossbar column number. (b) The fitting parameter m vs crossbar column number for various eNVM technologies listed in Table I. Other fitting parameters follow a similar trend. between the ideal current I and the observed currents ˆIi from non-ideal crossbars, for various columns. Taking one random case for the current, [PITH_FULL_IMAGE:f… view at source ↗

**Figure 6.** Figure 6: Classification accuracy for the CIFAR10 dataset on a VGG16 network [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

There is widespread interest in emerging technologies, especially resistive crossbars for accelerating Deep Neural Networks (DNNs). Resistive crossbars offer a highly-parallel and efficient matrix-vector-multiplication (MVM) operation. MVM being the most dominant operation in DNNs makes crossbars ideally suited. However, various sources of device and circuit non-idealities lead to errors in the MVM output, thereby reducing DNN accuracy. Towards that end, we propose crossbar re-mapping strategies to mitigate line-resistance induced accuracy degradation in DNNs, without having to re-train the learned weights, unlike most prior works. Line-resistances degrade the voltage levels along the crossbar columns, thereby inducing more errors at the columns away from the drivers. We rank the DNN weights and kernels based on a sensitivity analysis, and re-arrange the columns such that the most sensitive kernels are mapped closer to the drivers, thereby minimizing the impact of errors on the overall accuracy. We propose two algorithms $-$ static remapping strategy (SRS) and dynamic remapping strategy (DRS), to optimize the crossbar re-arrangement of a pre-trained DNN. We demonstrate the benefits of our approach on a standard VGG16 network trained using CIFAR10 dataset. Our results show that SRS and DRS limit the accuracy degradation to 2.9\% and 2.1\%, respectively, compared to a 5.6\% drop from an as it is mapping of weights and kernels to crossbars. We believe this work brings an additional aspect for optimization, which can be used in tandem with existing mitigation techniques, such as in-situ compensation, technology aware training and re-training approaches, to enhance system performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Remapping high-sensitivity kernels closer to drivers cuts the reported accuracy drop from 5.6% to 2.1-2.9% on VGG16/CIFAR10, but the pre-trained sensitivity ranking may not track actual column IR-drop errors.

read the letter

The paper's core result is that two remapping algorithms, SRS and DRS, move sensitive kernels nearer the drivers and thereby halve the accuracy loss from line resistance on this VGG16/CIFAR10 setup. No retraining is required, which is a practical plus for already-trained models running on resistive crossbars. The approach is new in the specific combination of sensitivity ranking plus static and dynamic column rearrangement; prior work on crossbar non-idealities does not describe these two procedures. The authors also note that the method can sit alongside compensation or retraining techniques, which is a reasonable engineering stance. The numbers come from a standard network and dataset, so the claim is at least reproducible in principle on the same benchmark. The soft spot is the reliance on sensitivity computed from the ideal pre-trained network. Column voltage drop is a position-dependent, conductance-weighted effect, so the kernels that actually suffer most once mapped could differ from the ranking derived without resistance. The abstract gives no detail on how sensitivity is calculated, no ablation of the ranking step, and no error bars on the accuracy figures. If the correlation between ideal sensitivity and real error is weak, the reported gains could be tied to this particular mapping rather than a general fix. The work is aimed at hardware teams building memristive DNN accelerators who already deal with line-resistance modeling. It is narrow but concrete, so a serious editor should send it to review rather than desk-reject; the idea is implementable and the empirical result is worth checking even if the validation needs strengthening.

Referee Report

2 major / 1 minor

Summary. The manuscript presents X-CHANGR, a set of crossbar remapping strategies (SRS and DRS) for mitigating line-resistance induced accuracy loss in memristive crossbar implementations of DNNs. The approach ranks kernels by sensitivity computed on the pre-trained model and remaps columns to place high-sensitivity kernels closer to the drivers. On VGG16 trained on CIFAR-10, it reports reducing accuracy degradation from 5.6% (naive mapping) to 2.9% (SRS) and 2.1% (DRS) without retraining the weights.

Significance. If the empirical results hold under more rigorous validation, this provides a low-overhead, post-training optimization for resistive-crossbar DNN accelerators that can be used alongside existing compensation or training-aware techniques. The concrete numbers on a standard VGG16/CIFAR-10 benchmark indicate practical relevance for hardware mapping.

major comments (2)

[Abstract] Abstract and results: the central claims report accuracy degradation limited to 2.9% (SRS) and 2.1% (DRS) versus 5.6% baseline, yet supply no error bars, no description of how the sensitivity ranking is computed, and no ablation of the ranking method itself. These omissions leave the numerical support for the mitigation only partially substantiated.
[Proposed remapping strategies] Proposed approach: sensitivity is derived from the ideal pre-trained network without line resistance. Column voltage drop is a position-dependent, conductance-weighted effect; the manuscript provides no simulation or test confirming that the pre-trained ranking predicts actual per-kernel error once kernels are mapped to specific columns. If this correlation is weak, the observed benefit may be specific to the VGG16 mapping rather than a general property of the method.

minor comments (1)

[Abstract] The abstract would benefit from a one-sentence statement of the line-resistance model parameters (e.g., resistance per segment, crossbar dimensions) used to generate the reported numbers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on substantiating the numerical claims and validating the sensitivity-based approach. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract and results: the central claims report accuracy degradation limited to 2.9% (SRS) and 2.1% (DRS) versus 5.6% baseline, yet supply no error bars, no description of how the sensitivity ranking is computed, and no ablation of the ranking method itself. These omissions leave the numerical support for the mitigation only partially substantiated.

Authors: We agree that the abstract and results would benefit from greater detail. In revision we will add an explicit description of the sensitivity computation (ranking kernels by the accuracy impact of position-dependent weight perturbations in the ideal pre-trained model). We will also insert an ablation comparing sensitivity-ranked remapping against random column permutation to isolate the contribution of the ranking. The reported figures are deterministic simulation outcomes on a fixed model and mapping; we will clarify this and note the absence of stochastic variation rather than add error bars. revision: partial
Referee: [Proposed remapping strategies] Proposed approach: sensitivity is derived from the ideal pre-trained network without line resistance. Column voltage drop is a position-dependent, conductance-weighted effect; the manuscript provides no simulation or test confirming that the pre-trained ranking predicts actual per-kernel error once kernels are mapped to specific columns. If this correlation is weak, the observed benefit may be specific to the VGG16 mapping rather than a general property of the method.

Authors: The ranking is deliberately computed on the ideal model because the method is a post-training, no-retrain optimization. The rationale is that kernels whose outputs are most sensitive to perturbations will suffer disproportionately from the larger voltage drops at distant columns. While the original manuscript relies on end-to-end accuracy improvement as evidence, we acknowledge the value of an explicit correlation check. We will add a new analysis (and figure) that maps each kernel’s sensitivity score against its measured output error at different column positions, thereby testing whether the pre-trained ranking predicts per-kernel degradation under line resistance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical remapping validated by direct measurement on fixed network

full rationale

The paper presents SRS and DRS as heuristic algorithms that reorder columns according to a sensitivity ranking computed once from the ideal pre-trained VGG16 on CIFAR-10; the reported accuracy figures (2.9 % / 2.1 % vs 5.6 %) are obtained by applying those fixed reorderings and measuring the resulting MVM error under line-resistance simulation. No equation, fitted parameter, or self-citation is invoked to derive or guarantee those numbers; the outcome is an independent empirical observation on the chosen mapping. The derivation chain therefore contains no self-definitional, fitted-input, or self-citation reduction and remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are stated in the abstract; the approach relies on standard sensitivity analysis and existing crossbar models.

pith-pipeline@v0.9.0 · 5852 in / 1052 out tokens · 18373 ms · 2026-05-25T12:22:41.188279+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends
cs.ET 2024-08 unverdicted novelty 4.0

Simulation study at 7 nm finds FeFET best for large arrays on ResNet-20/CIFAR-10, ReRAM competitive at higher bit-slices on ResNet-50/CIFAR-100, with partial wordline activation and custom ADC levels each raising accu...

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Learning deep architectures for ai,

Y . Bengio et al., “Learning deep architectures for ai,” Foundations and trends R⃝ in Machine Learning , vol. 2, no. 1, pp. 1–127, 2009

work page 2009
[2]

The learning machines,

N. Jones, “The learning machines,” Nature, vol. 505, no. 7482, p. 146, 2014

work page 2014
[3]

Mastering the game of go with deep neural networks and tree search,

D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016

work page 2016
[4]

Can programming be liberated from the von neumann style?: A functional style and its algebra of programs,

J. Backus, “Can programming be liberated from the von neumann style?: A functional style and its algebra of programs,” Commun. ACM, vol. 21, no. 8, pp. 613–641, Aug. 1978

work page 1978
[5]

An energy-efﬁcient VLSI architecture for pattern recognition via deep embedding of computation in SRAM,

M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz, “An energy-efﬁcient VLSI architecture for pattern recognition via deep embedding of computation in SRAM,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, may 2014

work page 2014
[6]

A 28 nm con- ﬁgurable memory (tcam/bcam/sram) using push-rule 6t bit cell enabling logic-in-memory,

S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm con- ﬁgurable memory (tcam/bcam/sram) using push-rule 6t bit cell enabling logic-in-memory,” IEEE Journal of Solid-State Circuits , vol. 51, no. 4, pp. 1009–1021, April 2016

work page 2016
[7]

In-memory computation of a machine-learning classiﬁer in a standard 6t SRAM array,

J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classiﬁer in a standard 6t SRAM array,” IEEE Journal of Solid-State Circuits , vol. 52, no. 4, pp. 915–924, apr 2017

work page 2017
[8]

A 0.3v VDDmin 4+2t SRAM for searching and in-memory computing using 55nm DDC technology,

Q. Dong, S. Jeloka, M. Saligane, Y . Kim, M. Kawaminami, A. Harada, S. Miyoshi, D. Blaauw, and D. Sylvester, “A 0.3v VDDmin 4+2t SRAM for searching and in-memory computing using 55nm DDC technology,” in 2017 Symposium on VLSI Circuits . IEEE, jun 2017

work page 2017
[9]

X-SRAM: Enabling in- memory boolean computations in CMOS static random access memo- ries,

A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, “X-SRAM: Enabling in- memory boolean computations in CMOS static random access memo- ries,” IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–14, 2018

work page 2018
[10]

Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks,

Z. Jiang, S. Yin, M. Seok, and J.-s. Seo, “Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks,” in 2018 IEEE Symposium on VLSI Technology . IEEE, 2018, pp. 173–174

work page 2018
[11]

Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays

A. Agrawal, A. Jaiswal, B. Han, G. Srinivasan, and K. Roy, “Xcel-ram: Accelerating binary neural networks in high-throughput sram compute arrays,” arXiv preprint arXiv:1807.00343 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

8T SRAM Cell as a Multi-bit Dot Product Engine for Beyond von-Neumann Computing

A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, “8t SRAM cell as a multi-bit dot product engine for beyond von-neumann computing,” arXiv preprint arXiv:1802.08601 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

Proposal for an all-spin artiﬁcial neural network: Emulating neural and synaptic functionalities through domain wall motion in ferromagnets,

A. Sengupta, Y . Shim, and K. Roy, “Proposal for an all-spin artiﬁcial neural network: Emulating neural and synaptic functionalities through domain wall motion in ferromagnets,” IEEE Transactions on Biomedical Circuits and Systems , vol. 10, no. 6, pp. 1152–1160, dec 2016

work page 2016
[14]

A memristor crossbar based computing engine optimized for high speed and accuracy,

C. Liu, Q. Yang, B. Yan, J. Yang, X. Du, W. Zhu, H. Jiang, Q. Wu, M. Barnell, and H. Li, “A memristor crossbar based computing engine optimized for high speed and accuracy,” in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) . IEEE, jul 2016

work page 2016
[15]

Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array,

S. B. Eryilmaz, D. Kuzum, R. Jeyasingh, S. Kim, M. BrightSky, C. Lam, and H.-S. P. Wong, “Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array,” Frontiers in Neuroscience, vol. 8, jul 2014

work page 2014
[16]

Training and operation of an integrated neuromorphic network based on metal-oxide memristors,

M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors,” Nature, vol. 521, no. 7550, pp. 61–64, may 2015

work page 2015
[17]

PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory,

P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y . Liu, Y . Wang, and Y . Xie, “PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) . IEEE, jun 2016

work page 2016
[18]

ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,

A. Shaﬁee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Stra- chan, M. Hu, R. S. Williams, and V . Srikumar, “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, jun 2016

work page 2016
[19]

Neuromorphic computing with multi-memristive synapses,

I. Boybat, M. L. Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y . Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic computing with multi-memristive synapses,” Nature Communications, vol. 9, no. 1, jun 2018

work page 2018
[20]

RESPARC,

A. Ankit, A. Sengupta, P. Panda, and K. Roy, “RESPARC,” in Proceed- ings of the 54th Annual Design Automation Conference (DAC) 2017 . ACM Press, 2017

work page 2017
[21]

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, J. P. Strachan, K. Roy, and D. S. Milojicic, “Puma: A programmable ultra-efﬁcient memristor-based accelerator for machine learning inference,” arXiv preprint arXiv:1901.10351 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901
[22]

Memristive crossbar mapping for neuromorphic computing systems on 3d IC,

Q. Xu, S. Chen, B. Yu, and F. Wu, “Memristive crossbar mapping for neuromorphic computing systems on 3d IC,” in Proceedings of the 2018 on Great Lakes Symposium on VLSI (GLSVLSI) . ACM Press, 2018

work page 2018
[23]

TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design,

A. Ankit, A. Sengupta, and K. Roy, “TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design,” in 2017 IEEE/ACM International Conference on Computer- Aided Design (ICCAD) . IEEE, nov 2017

work page 2017
[24]

Rx-caffe: Frame- work for evaluating and training deep neural networks on resistive crossbars,

S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “Rx-caffe: Frame- work for evaluating and training deep neural networks on resistive crossbars,” arXiv preprint arXiv:1809.00072 , 2018

work page arXiv 2018
[25]

Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars,

I. Chakraborty, D. Roy, and K. Roy, “Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars,” IEEE Transactions on Emerging Topics in Computational Intelligence , vol. 2, no. 5, pp. 335–344, oct 2018

work page 2018
[26]

Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar,

L. Chen, J. Li, Y . Chen, Q. Deng, J. Shen, X. Liang, and L. Jiang, “Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 . IEEE, mar 2017

work page 2017
[27]

Overcoming crossbar nonidealities in binary neural networks through learning,

M. E. Fouda, J. Lee, A. M. Eltawil, and F. Kurdahi, “Overcoming crossbar nonidealities in binary neural networks through learning,” in Proceedings of the 14th IEEE/ACM International Symposium on Nanoscale Architectures, ser. NANOARCH ’18. New York, NY , USA: ACM, 2018, pp. 31–33. [Online]. Available: http://doi.acm.org/10.1145/ 3232195.3232226

work page arXiv 2018
[28]

Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation,

J. Hu, C. J. Xue, W. Tseng, Y . He, M. Qiu, and E. H. . Sha, “Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation,” in Design Automation Conference , June 2010, pp. 350–355

work page 2010
[29]

Group scissor: Scaling neuromorphic computing design to large neural networks,

Y . Wang, W. Wen, B. Liu, D. Chiarulli, and H. Li, “Group scissor: Scaling neuromorphic computing design to large neural networks,” in 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) , June 2017, pp. 1–6

work page 2017
[30]

Overview of selector devices for 3-d stackable cross point rram arrays,

R. Aluguri and T. Tseng, “Overview of selector devices for 3-d stackable cross point rram arrays,” IEEE Journal of the Electron Devices Society , vol. 4, no. 5, pp. 294–306, Sep. 2016

work page 2016
[31]

Memristor crossbar-based neuromorphic computing system: A case study,

M. Hu, H. Li, Y . Chen, Q. Wu, G. S. Rose, and R. W. Linderman, “Memristor crossbar-based neuromorphic computing system: A case study,” IEEE Transactions on Neural Networks and Learning Systems , vol. 25, no. 10, pp. 1864–1878, Oct 2014

work page 2014
[32]

Learning internal representations by error propagation,

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” California Univ San Diego La Jolla Inst for Cognitive Science, Tech. Rep., 1985

work page 1985
[33]

Axnn: Energy-efﬁcient neuromorphic systems using approximate computing,

S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “Axnn: Energy-efﬁcient neuromorphic systems using approximate computing,” in 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) , Aug 2014, pp. 27–32

work page 2014
[34]

V oltage divider effect for the improvement of variability and endurance of TaOx memristor,

K. M. Kim, J. J. Yang, J. P. Strachan, E. M. Grafals, N. Ge, N. D. Melendez, Z. Li, and R. S. Williams, “V oltage divider effect for the improvement of variability and endurance of TaOx memristor,”Scientiﬁc Reports, vol. 6, no. 1, feb 2016

work page 2016
[35]

A functional hybrid memristor crossbar- array/CMOS system for data storage and neuromorphic applications,

K.-H. Kim, S. Gaba, D. Wheeler, J. M. Cruz-Albrecht, T. Hussain, N. Srinivasa, and W. Lu, “A functional hybrid memristor crossbar- array/CMOS system for data storage and neuromorphic applications,” Nano Letters, vol. 12, no. 1, pp. 389–395, dec 2011

work page 2011
[36]

Variation-tolerant spin-torque transfer (stt) mram array for yield enhancement,

J. Li, , S. Salahuddin, and K. Roy, “Variation-tolerant spin-torque transfer (stt) mram array for yield enhancement,” in 2008 IEEE Custom Integrated Circuits Conference, Sep. 2008, pp. 193–196

work page 2008
[37]

Automatic differentiation in pytorch,

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017

work page 2017
[38]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[39]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Citeseer, Tech. Rep., 2009

work page 2009

[1] [1]

Learning deep architectures for ai,

Y . Bengio et al., “Learning deep architectures for ai,” Foundations and trends R⃝ in Machine Learning , vol. 2, no. 1, pp. 1–127, 2009

work page 2009

[2] [2]

The learning machines,

N. Jones, “The learning machines,” Nature, vol. 505, no. 7482, p. 146, 2014

work page 2014

[3] [3]

Mastering the game of go with deep neural networks and tree search,

D. Silver et al., “Mastering the game of go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016

work page 2016

[4] [4]

Can programming be liberated from the von neumann style?: A functional style and its algebra of programs,

J. Backus, “Can programming be liberated from the von neumann style?: A functional style and its algebra of programs,” Commun. ACM, vol. 21, no. 8, pp. 613–641, Aug. 1978

work page 1978

[5] [5]

An energy-efﬁcient VLSI architecture for pattern recognition via deep embedding of computation in SRAM,

M. Kang, M.-S. Keel, N. R. Shanbhag, S. Eilert, and K. Curewitz, “An energy-efﬁcient VLSI architecture for pattern recognition via deep embedding of computation in SRAM,” in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, may 2014

work page 2014

[6] [6]

A 28 nm con- ﬁgurable memory (tcam/bcam/sram) using push-rule 6t bit cell enabling logic-in-memory,

S. Jeloka, N. B. Akesh, D. Sylvester, and D. Blaauw, “A 28 nm con- ﬁgurable memory (tcam/bcam/sram) using push-rule 6t bit cell enabling logic-in-memory,” IEEE Journal of Solid-State Circuits , vol. 51, no. 4, pp. 1009–1021, April 2016

work page 2016

[7] [7]

In-memory computation of a machine-learning classiﬁer in a standard 6t SRAM array,

J. Zhang, Z. Wang, and N. Verma, “In-memory computation of a machine-learning classiﬁer in a standard 6t SRAM array,” IEEE Journal of Solid-State Circuits , vol. 52, no. 4, pp. 915–924, apr 2017

work page 2017

[8] [8]

A 0.3v VDDmin 4+2t SRAM for searching and in-memory computing using 55nm DDC technology,

Q. Dong, S. Jeloka, M. Saligane, Y . Kim, M. Kawaminami, A. Harada, S. Miyoshi, D. Blaauw, and D. Sylvester, “A 0.3v VDDmin 4+2t SRAM for searching and in-memory computing using 55nm DDC technology,” in 2017 Symposium on VLSI Circuits . IEEE, jun 2017

work page 2017

[9] [9]

X-SRAM: Enabling in- memory boolean computations in CMOS static random access memo- ries,

A. Agrawal, A. Jaiswal, C. Lee, and K. Roy, “X-SRAM: Enabling in- memory boolean computations in CMOS static random access memo- ries,” IEEE Transactions on Circuits and Systems I: Regular Papers, pp. 1–14, 2018

work page 2018

[10] [10]

Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks,

Z. Jiang, S. Yin, M. Seok, and J.-s. Seo, “Xnor-sram: In-memory computing sram macro for binary/ternary deep neural networks,” in 2018 IEEE Symposium on VLSI Technology . IEEE, 2018, pp. 173–174

work page 2018

[11] [11]

Xcel-RAM: Accelerating Binary Neural Networks in High-Throughput SRAM Compute Arrays

A. Agrawal, A. Jaiswal, B. Han, G. Srinivasan, and K. Roy, “Xcel-ram: Accelerating binary neural networks in high-throughput sram compute arrays,” arXiv preprint arXiv:1807.00343 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

8T SRAM Cell as a Multi-bit Dot Product Engine for Beyond von-Neumann Computing

A. Jaiswal, I. Chakraborty, A. Agrawal, and K. Roy, “8t SRAM cell as a multi-bit dot product engine for beyond von-neumann computing,” arXiv preprint arXiv:1802.08601 , 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

Proposal for an all-spin artiﬁcial neural network: Emulating neural and synaptic functionalities through domain wall motion in ferromagnets,

A. Sengupta, Y . Shim, and K. Roy, “Proposal for an all-spin artiﬁcial neural network: Emulating neural and synaptic functionalities through domain wall motion in ferromagnets,” IEEE Transactions on Biomedical Circuits and Systems , vol. 10, no. 6, pp. 1152–1160, dec 2016

work page 2016

[14] [14]

A memristor crossbar based computing engine optimized for high speed and accuracy,

C. Liu, Q. Yang, B. Yan, J. Yang, X. Du, W. Zhu, H. Jiang, Q. Wu, M. Barnell, and H. Li, “A memristor crossbar based computing engine optimized for high speed and accuracy,” in 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) . IEEE, jul 2016

work page 2016

[15] [15]

Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array,

S. B. Eryilmaz, D. Kuzum, R. Jeyasingh, S. Kim, M. BrightSky, C. Lam, and H.-S. P. Wong, “Brain-like associative learning using a nanoscale non-volatile phase change synaptic device array,” Frontiers in Neuroscience, vol. 8, jul 2014

work page 2014

[16] [16]

Training and operation of an integrated neuromorphic network based on metal-oxide memristors,

M. Prezioso, F. Merrikh-Bayat, B. D. Hoskins, G. C. Adam, K. K. Likharev, and D. B. Strukov, “Training and operation of an integrated neuromorphic network based on metal-oxide memristors,” Nature, vol. 521, no. 7550, pp. 61–64, may 2015

work page 2015

[17] [17]

PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory,

P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y . Liu, Y . Wang, and Y . Xie, “PRIME: A novel processing-in-memory architecture for neural network computation in ReRAM-based main memory,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA) . IEEE, jun 2016

work page 2016

[18] [18]

ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,

A. Shaﬁee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Stra- chan, M. Hu, R. S. Williams, and V . Srikumar, “ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” in 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). IEEE, jun 2016

work page 2016

[19] [19]

Neuromorphic computing with multi-memristive synapses,

I. Boybat, M. L. Gallo, S. R. Nandakumar, T. Moraitis, T. Parnell, T. Tuma, B. Rajendran, Y . Leblebici, A. Sebastian, and E. Eleftheriou, “Neuromorphic computing with multi-memristive synapses,” Nature Communications, vol. 9, no. 1, jun 2018

work page 2018

[20] [20]

RESPARC,

A. Ankit, A. Sengupta, P. Panda, and K. Roy, “RESPARC,” in Proceed- ings of the 54th Annual Design Automation Conference (DAC) 2017 . ACM Press, 2017

work page 2017

[21] [21]

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

A. Ankit, I. E. Hajj, S. R. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, J. P. Strachan, K. Roy, and D. S. Milojicic, “Puma: A programmable ultra-efﬁcient memristor-based accelerator for machine learning inference,” arXiv preprint arXiv:1901.10351 , 2019

work page internal anchor Pith review Pith/arXiv arXiv 1901

[22] [22]

Memristive crossbar mapping for neuromorphic computing systems on 3d IC,

Q. Xu, S. Chen, B. Yu, and F. Wu, “Memristive crossbar mapping for neuromorphic computing systems on 3d IC,” in Proceedings of the 2018 on Great Lakes Symposium on VLSI (GLSVLSI) . ACM Press, 2018

work page 2018

[23] [23]

TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design,

A. Ankit, A. Sengupta, and K. Roy, “TraNNsformer: Neural network transformation for memristive crossbar based neuromorphic system design,” in 2017 IEEE/ACM International Conference on Computer- Aided Design (ICCAD) . IEEE, nov 2017

work page 2017

[24] [24]

Rx-caffe: Frame- work for evaluating and training deep neural networks on resistive crossbars,

S. Jain, A. Sengupta, K. Roy, and A. Raghunathan, “Rx-caffe: Frame- work for evaluating and training deep neural networks on resistive crossbars,” arXiv preprint arXiv:1809.00072 , 2018

work page arXiv 2018

[25] [25]

Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars,

I. Chakraborty, D. Roy, and K. Roy, “Technology aware training in memristive neuromorphic systems for nonideal synaptic crossbars,” IEEE Transactions on Emerging Topics in Computational Intelligence , vol. 2, no. 5, pp. 335–344, oct 2018

work page 2018

[26] [26]

Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar,

L. Chen, J. Li, Y . Chen, Q. Deng, J. Shen, X. Liang, and L. Jiang, “Accelerator-friendly neural-network training: Learning variations and defects in RRAM crossbar,” in Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017 . IEEE, mar 2017

work page 2017

[27] [27]

Overcoming crossbar nonidealities in binary neural networks through learning,

M. E. Fouda, J. Lee, A. M. Eltawil, and F. Kurdahi, “Overcoming crossbar nonidealities in binary neural networks through learning,” in Proceedings of the 14th IEEE/ACM International Symposium on Nanoscale Architectures, ser. NANOARCH ’18. New York, NY , USA: ACM, 2018, pp. 31–33. [Online]. Available: http://doi.acm.org/10.1145/ 3232195.3232226

work page arXiv 2018

[28] [28]

Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation,

J. Hu, C. J. Xue, W. Tseng, Y . He, M. Qiu, and E. H. . Sha, “Reducing write activities on non-volatile memories in embedded cmps via data migration and recomputation,” in Design Automation Conference , June 2010, pp. 350–355

work page 2010

[29] [29]

Group scissor: Scaling neuromorphic computing design to large neural networks,

Y . Wang, W. Wen, B. Liu, D. Chiarulli, and H. Li, “Group scissor: Scaling neuromorphic computing design to large neural networks,” in 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC) , June 2017, pp. 1–6

work page 2017

[30] [30]

Overview of selector devices for 3-d stackable cross point rram arrays,

R. Aluguri and T. Tseng, “Overview of selector devices for 3-d stackable cross point rram arrays,” IEEE Journal of the Electron Devices Society , vol. 4, no. 5, pp. 294–306, Sep. 2016

work page 2016

[31] [31]

Memristor crossbar-based neuromorphic computing system: A case study,

M. Hu, H. Li, Y . Chen, Q. Wu, G. S. Rose, and R. W. Linderman, “Memristor crossbar-based neuromorphic computing system: A case study,” IEEE Transactions on Neural Networks and Learning Systems , vol. 25, no. 10, pp. 1864–1878, Oct 2014

work page 2014

[32] [32]

Learning internal representations by error propagation,

D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning internal representations by error propagation,” California Univ San Diego La Jolla Inst for Cognitive Science, Tech. Rep., 1985

work page 1985

[33] [33]

Axnn: Energy-efﬁcient neuromorphic systems using approximate computing,

S. Venkataramani, A. Ranjan, K. Roy, and A. Raghunathan, “Axnn: Energy-efﬁcient neuromorphic systems using approximate computing,” in 2014 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED) , Aug 2014, pp. 27–32

work page 2014

[34] [34]

V oltage divider effect for the improvement of variability and endurance of TaOx memristor,

K. M. Kim, J. J. Yang, J. P. Strachan, E. M. Grafals, N. Ge, N. D. Melendez, Z. Li, and R. S. Williams, “V oltage divider effect for the improvement of variability and endurance of TaOx memristor,”Scientiﬁc Reports, vol. 6, no. 1, feb 2016

work page 2016

[35] [35]

A functional hybrid memristor crossbar- array/CMOS system for data storage and neuromorphic applications,

K.-H. Kim, S. Gaba, D. Wheeler, J. M. Cruz-Albrecht, T. Hussain, N. Srinivasa, and W. Lu, “A functional hybrid memristor crossbar- array/CMOS system for data storage and neuromorphic applications,” Nano Letters, vol. 12, no. 1, pp. 389–395, dec 2011

work page 2011

[36] [36]

Variation-tolerant spin-torque transfer (stt) mram array for yield enhancement,

J. Li, , S. Salahuddin, and K. Roy, “Variation-tolerant spin-torque transfer (stt) mram array for yield enhancement,” in 2008 IEEE Custom Integrated Circuits Conference, Sep. 2008, pp. 193–196

work page 2008

[37] [37]

Automatic differentiation in pytorch,

A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017

work page 2017

[38] [38]

Very Deep Convolutional Networks for Large-Scale Image Recognition

K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556 , 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[39] [39]

Learning multiple layers of features from tiny images,

A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” Citeseer, Tech. Rep., 2009

work page 2009