Recurrent Neural Networks Hardware Implementation on FPGA

Andre Xian Ming Chang , Berin Martini , Eugenio Culurciello

Authors on Pith no claims yet

classification 💻 cs.NE

keywords hardwarerecurrentfpgaimplementationmemorynetworksneuraloffer

read the original abstract

Recurrent Neural Networks (RNNs) have the ability to retain memory and learn data sequences. Due to the recurrent nature of RNNs, it is sometimes hard to parallelize all its computations on conventional hardware. CPUs do not currently offer large parallelism, while GPUs offer limited parallelism due to sequential components of RNN models. In this paper we present a hardware implementation of Long-Short Term Memory (LSTM) recurrent network on the programmable logic Zynq 7020 FPGA from Xilinx. We implemented a RNN with $2$ layers and $128$ hidden units in hardware and it has been tested using a character level language model. The implementation is more than $21\times$ faster than the ARM CPU embedded on the Zynq 7020 FPGA. This work can potentially evolve to a RNN co-processor for future mobile devices.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications
cs.LG 2026-05 unverdicted novelty 7.0

Cumulative state updates in CMRU restore gradient flow through time in quantized bistable RNNs, yielding more stable convergence and competitive or superior performance versus LRUs and minGRUs on long-range sequence tasks.