pith. sign in

arxiv: 1907.02987 · v1 · pith:46OLDE63new · submitted 2019-07-05 · 💻 cs.ET · cs.LG

RED: A ReRAM-based Deconvolution Accelerator

Pith reviewed 2026-05-25 01:31 UTC · model grok-4.3

classification 💻 cs.ET cs.LG
keywords ReRAMdeconvolutionacceleratorneural networksprocessing-in-memoryzero-skippingpixel-wise mappingGANs
0
0 comments X

The pith

RED accelerator speeds deconvolution 3.69x on ReRAM hardware using pixel mapping and zero skipping

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that deconvolution in neural networks incurs extra latency and energy on ReRAM accelerators because zero-insertion operations create redundancy and require additional steps. RED counters this with two methods: a pixel-wise mapping scheme that removes the redundancy and a zero-skipping data flow that raises parallelism. A sympathetic reader would care if these changes make ReRAM practical for generative adversarial networks and semantic segmentation. If the integration succeeds, deconvolution becomes competitive with standard convolution on the same hardware platform.

Core claim

RED integrates the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism, delivering speedups from 3.69x down to 1.15x and energy reductions from 8 percent to 88.36 percent versus prior ReRAM designs.

What carries the argument

The RED accelerator design that pairs pixel-wise mapping with zero-skipping data flow to handle deconvolution directly in ReRAM crossbars.

If this is right

  • Deconvolution layers can run on ReRAM without the previous long latency from zero padding.
  • Energy per operation drops substantially for networks that rely on transposed convolutions.
  • Computation parallelism rises because skipped zeros no longer occupy cycles or array space.
  • The same ReRAM substrate can now support both convolution and deconvolution workloads at comparable efficiency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar mapping and skipping tactics could apply to other neural-network operations that introduce structured sparsity or padding.
  • Edge devices running generative models might become feasible if the energy reductions hold across full networks.
  • Designers could test whether combining RED with existing convolution accelerators yields further system-level gains.

Load-bearing premise

The pixel-wise mapping and zero-skipping dataflow can be realized in ReRAM hardware without new latency, area, or accuracy penalties that erase the reported gains.

What would settle it

A hardware prototype of RED whose measured end-to-end latency or energy exceeds that of the baseline accelerator once mapping and skipping overheads are included.

Figures

Figures reproduced from arXiv: 1907.02987 by Bing Li, Hai (Helen) Li, Yiran Chen, Zichen Fan, Ziru Li.

Figure 2
Figure 2. Figure 2: Pseudo codes of traditional deconvolution algorithms. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: The zero redundancy ratio in zero-padding deconvolution changing [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 3
Figure 3. Figure 3: Deconvolution on ReRAM-based accelerator. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: The illustration of RED architecture(a), pixel-wise mapping(b) and zero-skipping data flow(c). [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The four computation modes in deconvolution when the kernel size [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) shows that RED annexes the advantages of both padding-free and zero-padding designs. It acquires the lowest total latency and achieves highest speedup across all the benchmarks. The performance improvement of RED benefits from two aspects: 1) it eliminates the zero redundancy in input vectors and diminishes the number of cycles; and 2) the size of output vectors is the same as the zero-padding design, … view at source ↗
Figure 9
Figure 9. Figure 9: The area comparison. V. CONCLUSION This work introduces RED, a high-performance and energy￾efficient ReRAM-based deconvolution accelerator. Through the optimization of the mapping design and data flow, RED eliminates the redundant computations and avoids the over￾head of the incremental periphery circuitry. Experimental evaluation shows that RED outperforms the existing ReRAM￾based accelerators for the com… view at source ↗
read the original abstract

Deconvolution has been widespread in neural networks. For example, it is essential for performing unsupervised learning in generative adversarial networks or constructing fully convolutional networks for semantic segmentation. Resistive RAM (ReRAM)-based processing-in-memory architecture has been widely explored in accelerating convolutional computation and demonstrates good performance. Performing deconvolution on existing ReRAM-based accelerator designs, however, suffers from long latency and high energy consumption because deconvolutional computation includes not only convolution but also extra add-on operations. To realize the more efficient execution for deconvolution, we analyze its computation requirement and propose a ReRAM-based accelerator design, namely, RED. More specific, RED integrates two orthogonal methods, the pixel-wise mapping scheme for reducing redundancy caused by zero-inserting operations and the zero-skipping data flow for increasing the computation parallelism and therefore improving performance. Experimental evaluations show that compared to the state-of-the-art ReRAM-based accelerator, RED can speed up operation 3.69x~1.15x and reduce 8%~88.36% energy consumption.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes RED, a ReRAM-based processing-in-memory accelerator for deconvolution operations. It introduces two orthogonal techniques: a pixel-wise mapping scheme to reduce redundancy from zero-insertion operations and a zero-skipping dataflow to increase computation parallelism. The central claim is that these methods enable 3.69×–1.15× speedup and 8%–88.36% energy reduction relative to prior ReRAM-based accelerators for workloads in GANs and FCNs.

Significance. If the net gains hold after accounting for implementation overheads, the work would be significant for addressing a known inefficiency in ReRAM PIM designs when handling deconvolution, which is increasingly important in generative and segmentation networks. The orthogonal combination of mapping and skipping is a conceptual strength.

major comments (2)
  1. [Abstract] Abstract: the reported 3.69×–1.15× speedup and 8%–88.36% energy savings are presented without any quantitative breakdown of added latency, area, or power from the pixel-wise mapping circuitry and zero-skipping control logic; this is load-bearing because the central claim requires that these mechanisms produce net improvements rather than being offset by new overheads.
  2. [Abstract] Abstract and design description: no benchmark workload details, mapping parameters, crossbar utilization figures, or error bars are supplied to allow verification that the evaluated deconvolution patterns match those in target applications (e.g., transposed convolutions in GAN generators); without this, the range of reported gains cannot be assessed for generality.
minor comments (1)
  1. Notation for the two proposed schemes should be introduced with consistent abbreviations on first use to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to improve clarity in the abstract.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 3.69×–1.15× speedup and 8%–88.36% energy savings are presented without any quantitative breakdown of added latency, area, or power from the pixel-wise mapping circuitry and zero-skipping control logic; this is load-bearing because the central claim requires that these mechanisms produce net improvements rather than being offset by new overheads.

    Authors: We agree that the abstract should explicitly note that the reported figures are net gains. The cycle-accurate evaluations model and include the latency, area, and power of the pixel-wise mapping circuitry and zero-skipping control logic when comparing against prior accelerators. We will revise the abstract to state that the improvements account for these overheads. revision: yes

  2. Referee: [Abstract] Abstract and design description: no benchmark workload details, mapping parameters, crossbar utilization figures, or error bars are supplied to allow verification that the evaluated deconvolution patterns match those in target applications (e.g., transposed convolutions in GAN generators); without this, the range of reported gains cannot be assessed for generality.

    Authors: The evaluation section provides workload details from GANs and FCNs, mapping parameters, and crossbar utilization. We will revise the abstract to summarize the workloads and direct readers to the evaluation section for parameters and figures. Error bars are absent because results are from deterministic simulations; we can add a sensitivity discussion if needed. revision: partial

Circularity Check

0 steps flagged

No significant circularity; claims rest on external experimental comparisons

full rationale

The paper proposes two hardware techniques (pixel-wise mapping and zero-skipping dataflow) for ReRAM-based deconvolution acceleration and reports speedups/energy savings from experimental evaluations against prior accelerators. No equations, fitted parameters, self-citations, or uniqueness theorems appear in the provided text that would reduce the reported gains to quantities defined by the design itself. The derivation chain consists of analysis of deconvolution requirements followed by independent implementation and benchmarking, making the result self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. Design parameters such as ReRAM array dimensions or mapping granularity are implicit but not enumerated.

pith-pipeline@v0.9.0 · 5718 in / 1270 out tokens · 27972 ms · 2026-05-25T01:31:46.688115+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 3 internal anchors

  1. [1]

    Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling

    Jiajun Wu et al. Learning a probabilistic latent space of object shapes via 3d generative-adversarial modeling. In NIPS, pages 82–90, 2016

  2. [2]

    Semantic Image Inpainting with Deep Generative Models

    Raymond Yeh et al. Semantic image inpainting with perceptual and contextual losses. arxiv preprint. arXiv:1607.07539

  3. [3]

    Fully convolutional networks for semantic segmentation

    Jonathan Long et al. Fully convolutional networks for semantic segmentation. In CVPR, pages 3431–3440, 2015

  4. [4]

    Single-shot refinement neural network for object detection

    Shifeng Zhang et al. Single-shot refinement neural network for object detection. In IEEE CVPR, 2018

  5. [5]

    Prime: A novel processing-in-memory archi- tecture for neural network computation in reram-based main memory

    Ping Chi et al. Prime: A novel processing-in-memory archi- tecture for neural network computation in reram-based main memory. In SIGARCH Comput. Archit. News, volume 44, pages 27–39, 2016

  6. [6]

    Time: A training-in-memory architecture for rram-based deep neural networks

    Ming Cheng et al. Time: A training-in-memory architecture for rram-based deep neural networks. TCAD, 2018

  7. [7]

    Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars

    Ali Shafiee et al. Isaac: A convolutional neural network accel- erator with in-situ analog arithmetic in crossbars. SIGARCH Comput. Archit. News , 44(3):14–26, 2016

  8. [8]

    Pipelayer: A pipelined reram-based accel- erator for deep learning

    Linghao Song et al. Pipelayer: A pipelined reram-based accel- erator for deep learning. In HPCA, pages 541–552, 2017

  9. [9]

    Atomlayer: a universal reram-based cnn accelerator with atomic layer computation

    Ximing Qiao et al. Atomlayer: a universal reram-based cnn accelerator with atomic layer computation. In DAC

  10. [10]

    Reram-based accelerator for deep learning

    Bing Li et al. Reram-based accelerator for deep learning. In DATE, pages 815–820, 2018

  11. [11]

    Fcn-engine: Accelerating deconvolutional layers in classic cnn processors

    Dawen Xu et al. Fcn-engine: Accelerating deconvolutional layers in classic cnn processors. In ICCAD, 2018

  12. [12]

    Regan: A pipelined reram-based accelerator for generative adversarial networks

    Fan Chen et al. Regan: A pipelined reram-based accelerator for generative adversarial networks. In ASP-DAC

  13. [13]

    Spectral Normalization for Generative Adversarial Networks

    Takeru Miyato et al. Spectral normalization for generative adversarial networks. arXiv:1802.05957, 2018

  14. [14]

    Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

    Alec Radford et al. Unsupervised representation learn- ing with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015

  15. [15]

    Improved techniques for training gans

    Tim Salimans et al. Improved techniques for training gans. In NIPS, pages 2234–2242, 2016

  16. [16]

    Neurosim+: An integrated device-to- algorithm framework for benchmarking synaptic devices and array architectures

    Pai Yu Chen et al. Neurosim+: An integrated device-to- algorithm framework for benchmarking synaptic devices and array architectures. In IEDM, pages 6–1, 2018