RED: A ReRAM-based Deconvolution Accelerator

Bing Li; Hai (Helen) Li; Yiran Chen; Zichen Fan; Ziru Li

arxiv: 1907.02987 · v1 · pith:46OLDE63new · submitted 2019-07-05 · 💻 cs.ET · cs.LG

RED: A ReRAM-based Deconvolution Accelerator

Zichen Fan , Ziru Li , Bing Li , Yiran Chen , Hai (Helen) Li This is my paper

Pith reviewed 2026-05-25 01:31 UTC · model grok-4.3

classification 💻 cs.ET cs.LG

keywords ReRAMdeconvolutionacceleratorneural networksprocessing-in-memoryzero-skippingpixel-wise mappingGANs

0 comments

The pith

RED accelerator speeds deconvolution 3.69x on ReRAM hardware using pixel mapping and zero skipping

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deconvolution in neural networks incurs extra latency and energy on ReRAM accelerators because zero-insertion operations create redundancy and require additional steps. RED counters this with two methods: a pixel-wise mapping scheme that removes the redundancy and a zero-skipping data flow that raises parallelism. A sympathetic reader would care if these changes make ReRAM practical for generative adversarial networks and semantic segmentation. If the integration succeeds, deconvolution becomes competitive with standard convolution on the same hardware platform.

Core claim

RED integrates the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism, delivering speedups from 3.69x down to 1.15x and energy reductions from 8 percent to 88.36 percent versus prior ReRAM designs.

What carries the argument

The RED accelerator design that pairs pixel-wise mapping with zero-skipping data flow to handle deconvolution directly in ReRAM crossbars.

If this is right

Deconvolution layers can run on ReRAM without the previous long latency from zero padding.
Energy per operation drops substantially for networks that rely on transposed convolutions.
Computation parallelism rises because skipped zeros no longer occupy cycles or array space.
The same ReRAM substrate can now support both convolution and deconvolution workloads at comparable efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar mapping and skipping tactics could apply to other neural-network operations that introduce structured sparsity or padding.
Edge devices running generative models might become feasible if the energy reductions hold across full networks.
Designers could test whether combining RED with existing convolution accelerators yields further system-level gains.

Load-bearing premise

The pixel-wise mapping and zero-skipping dataflow can be realized in ReRAM hardware without new latency, area, or accuracy penalties that erase the reported gains.

What would settle it

A hardware prototype of RED whose measured end-to-end latency or energy exceeds that of the baseline accelerator once mapping and skipping overheads are included.

Figures

Figures reproduced from arXiv: 1907.02987 by Bing Li, Hai (Helen) Li, Yiran Chen, Zichen Fan, Ziru Li.

**Figure 4.** Figure 4: The zero redundancy ratio in zero-padding deconvolution changing [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 3.** Figure 3: Deconvolution on ReRAM-based accelerator. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: The illustration of RED architecture(a), pixel-wise mapping(b) and zero-skipping data flow(c). [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: The four computation modes in deconvolution when the kernel size [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: (a) shows that RED annexes the advantages of both padding-free and zero-padding designs. It acquires the lowest total latency and achieves highest speedup across all the benchmarks. The performance improvement of RED benefits from two aspects: 1) it eliminates the zero redundancy in input vectors and diminishes the number of cycles; and 2) the size of output vectors is the same as the zero-padding design, … view at source ↗

**Figure 9.** Figure 9: The area comparison. V. CONCLUSION This work introduces RED, a high-performance and energyefficient ReRAM-based deconvolution accelerator. Through the optimization of the mapping design and data flow, RED eliminates the redundant computations and avoids the overhead of the incremental periphery circuitry. Experimental evaluation shows that RED outperforms the existing ReRAMbased accelerators for the com… view at source ↗

read the original abstract

Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69x~1.15x and reduce 8%~88.36% energy consumption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes RED, a ReRAM-based processing-in-memory accelerator for deconvolution operations. It introduces two orthogonal techniques: a pixel-wise mapping scheme to reduce redundancy from zero-insertion operations and a zero-skipping dataflow to increase computation parallelism. The central claim is that these methods enable 3.69×–1.15× speedup and 8%–88.36% energy reduction relative to prior ReRAM-based accelerators for workloads in GANs and FCNs.

Significance. If the net gains hold after accounting for implementation overheads, the work would be significant for addressing a known inefficiency in ReRAM PIM designs when handling deconvolution, which is increasingly important in generative and segmentation networks. The orthogonal combination of mapping and skipping is a conceptual strength.

major comments (2)

[Abstract] Abstract: the reported 3.69×–1.15× speedup and 8%–88.36% energy savings are presented without any quantitative breakdown of added latency, area, or power from the pixel-wise mapping circuitry and zero-skipping control logic; this is load-bearing because the central claim requires that these mechanisms produce net improvements rather than being offset by new overheads.
[Abstract] Abstract and design description: no benchmark workload details, mapping parameters, crossbar utilization figures, or error bars are supplied to allow verification that the evaluated deconvolution patterns match those in target applications (e.g., transposed convolutions in GAN generators); without this, the range of reported gains cannot be assessed for generality.

minor comments (1)

Notation for the two proposed schemes should be introduced with consistent abbreviations on first use to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity in the abstract.

read point-by-point responses

Referee: [Abstract] Abstract: the reported 3.69×–1.15× speedup and 8%–88.36% energy savings are presented without any quantitative breakdown of added latency, area, or power from the pixel-wise mapping circuitry and zero-skipping control logic; this is load-bearing because the central claim requires that these mechanisms produce net improvements rather than being offset by new overheads.

Authors: We agree that the abstract should explicitly note that the reported figures are net gains. The cycle-accurate evaluations model and include the latency, area, and power of the pixel-wise mapping circuitry and zero-skipping control logic when comparing against prior accelerators. We will revise the abstract to state that the improvements account for these overheads. revision: yes
Referee: [Abstract] Abstract and design description: no benchmark workload details, mapping parameters, crossbar utilization figures, or error bars are supplied to allow verification that the evaluated deconvolution patterns match those in target applications (e.g., transposed convolutions in GAN generators); without this, the range of reported gains cannot be assessed for generality.

Authors: The evaluation section provides workload details from GANs and FCNs, mapping parameters, and crossbar utilization. We will revise the abstract to summarize the workloads and direct readers to the evaluation section for parameters and figures. Error bars are absent because results are from deterministic simulations; we can add a sensitivity discussion if needed. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external experimental comparisons

full rationale

The paper proposes two hardware techniques (pixel-wise mapping and zero-skipping dataflow) for ReRAM-based deconvolution acceleration and reports speedups/energy savings from experimental evaluations against prior accelerators. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text that would reduce the reported gains to quantities defined by the design itself. The derivation chain consists of analysis of deconvolution requirements followed by independent implementation and benchmarking, making the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. Design parameters such as ReRAM array dimensions or mapping granularity are implicit but not enumerated.

pith-pipeline@v0.9.0 · 5718 in / 1270 out tokens · 27972 ms · 2026-05-25T01:31:46.688115+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 3 internal anchors

[1]

Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling

Jiajun Wu et al. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NIPS, pages 82–90, 2016

work page 2016
[2]

Semantic Image Inpainting with Deep Generative Models

Raymond Yeh et al. Semantic image inpainting with perceptual and contextual losses. arxiv preprint. arXiv:1607.07539

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Fully convolutional networks for semantic segmentation

Jonathan Long et al. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015

work page 2015
[4]

Single-shot reﬁnement neural network for object detection

Shifeng Zhang et al. Single-shot reﬁnement neural network for object detection. In IEEE CVPR, 2018

work page 2018
[5]

Prime: A novel processing-in-memory archi- tecture for neural network computation in reram-based main memory

Ping Chi et al. Prime: A novel processing-in-memory archi- tecture for neural network computation in reram-based main memory. In SIGARCH Comput. Archit. News, volume 44, pages 27–39, 2016

work page 2016
[6]

Time: A training-in-memory architecture for rram-based deep neural networks

Ming Cheng et al. Time: A training-in-memory architecture for rram-based deep neural networks. TCAD, 2018

work page 2018
[7]

Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars

Ali Shaﬁee et al. Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars. SIGARCH Comput. Archit. News , 44(3):14–26, 2016

work page 2016
[8]

Pipelayer: A pipelined reram-based accel- erator for deep learning

Linghao Song et al. Pipelayer: A pipelined reram-based accel- erator for deep learning. In HPCA, pages 541–552, 2017

work page 2017
[9]

Atomlayer: a universal reram-based cnn accelerator with atomic layer computation

Ximing Qiao et al. Atomlayer: a universal reram-based cnn accelerator with atomic layer computation. In DAC

work page
[10]

Reram-based accelerator for deep learning

Bing Li et al. Reram-based accelerator for deep learning. In DATE, pages 815–820, 2018

work page 2018
[11]

Fcn-engine: Accelerating deconvolutional layers in classic cnn processors

Dawen Xu et al. Fcn-engine: Accelerating deconvolutional layers in classic cnn processors. In ICCAD, 2018

work page 2018
[12]

Regan: A pipelined reram-based accelerator for generative adversarial networks

Fan Chen et al. Regan: A pipelined reram-based accelerator for generative adversarial networks. In ASP-DAC

work page
[13]

Spectral Normalization for Generative Adversarial Networks

Takeru Miyato et al. Spectral normalization for generative adversarial networks. arXiv:1802.05957, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[14]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford et al. Unsupervised representation learn- ing with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[15]

Improved techniques for training gans

Tim Salimans et al. Improved techniques for training gans. In NIPS, pages 2234–2242, 2016

work page 2016
[16]

Neurosim+: An integrated device-to- algorithm framework for benchmarking synaptic devices and array architectures

Pai Yu Chen et al. Neurosim+: An integrated device-to- algorithm framework for benchmarking synaptic devices and array architectures. In IEDM, pages 6–1, 2018

work page 2018

[1] [1]

Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling

Jiajun Wu et al. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NIPS, pages 82–90, 2016

work page 2016

[2] [2]

Semantic Image Inpainting with Deep Generative Models

Raymond Yeh et al. Semantic image inpainting with perceptual and contextual losses. arxiv preprint. arXiv:1607.07539

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Fully convolutional networks for semantic segmentation

Jonathan Long et al. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015

work page 2015

[4] [4]

Single-shot reﬁnement neural network for object detection

Shifeng Zhang et al. Single-shot reﬁnement neural network for object detection. In IEEE CVPR, 2018

work page 2018

[5] [5]

Prime: A novel processing-in-memory archi- tecture for neural network computation in reram-based main memory

Ping Chi et al. Prime: A novel processing-in-memory archi- tecture for neural network computation in reram-based main memory. In SIGARCH Comput. Archit. News, volume 44, pages 27–39, 2016

work page 2016

[6] [6]

Time: A training-in-memory architecture for rram-based deep neural networks

Ming Cheng et al. Time: A training-in-memory architecture for rram-based deep neural networks. TCAD, 2018

work page 2018

[7] [7]

Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars

Ali Shaﬁee et al. Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars. SIGARCH Comput. Archit. News , 44(3):14–26, 2016

work page 2016

[8] [8]

Pipelayer: A pipelined reram-based accel- erator for deep learning

Linghao Song et al. Pipelayer: A pipelined reram-based accel- erator for deep learning. In HPCA, pages 541–552, 2017

work page 2017

[9] [9]

Atomlayer: a universal reram-based cnn accelerator with atomic layer computation

Ximing Qiao et al. Atomlayer: a universal reram-based cnn accelerator with atomic layer computation. In DAC

work page

[10] [10]

Reram-based accelerator for deep learning

Bing Li et al. Reram-based accelerator for deep learning. In DATE, pages 815–820, 2018

work page 2018

[11] [11]

Fcn-engine: Accelerating deconvolutional layers in classic cnn processors

Dawen Xu et al. Fcn-engine: Accelerating deconvolutional layers in classic cnn processors. In ICCAD, 2018

work page 2018

[12] [12]

Regan: A pipelined reram-based accelerator for generative adversarial networks

Fan Chen et al. Regan: A pipelined reram-based accelerator for generative adversarial networks. In ASP-DAC

work page

[13] [13]

Spectral Normalization for Generative Adversarial Networks

Takeru Miyato et al. Spectral normalization for generative adversarial networks. arXiv:1802.05957, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[14] [14]

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

Alec Radford et al. Unsupervised representation learn- ing with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[15] [15]

Improved techniques for training gans

Tim Salimans et al. Improved techniques for training gans. In NIPS, pages 2234–2242, 2016

work page 2016

[16] [16]

Neurosim+: An integrated device-to- algorithm framework for benchmarking synaptic devices and array architectures

Pai Yu Chen et al. Neurosim+: An integrated device-to- algorithm framework for benchmarking synaptic devices and array architectures. In IEDM, pages 6–1, 2018

work page 2018