Cross-Layer Co-Optimized LSTM Accelerator for Real-Time Gait Analysis

Alar Kuusik; Jaan Raik; Levent Aksoy; Mohammad Eslami; Mohammad Hasan Ahmadilivani

arxiv: 2604.13543 · v1 · submitted 2026-04-15 · 💻 cs.AR · cs.LG

Cross-Layer Co-Optimized LSTM Accelerator for Real-Time Gait Analysis

Mohammad Hasan Ahmadilivani , Levent Aksoy , Mohammad Eslami , Jaan Raik , Alar Kuusik This is my paper

Pith reviewed 2026-05-10 12:33 UTC · model grok-4.3

classification 💻 cs.AR cs.LG

keywords LSTM acceleratorASIC designgait analysisreal-time detectionhardware quantizationcross-layer optimizationabnormality detectionedge computing

0 comments

The pith

A cross-layer co-optimized LSTM accelerator on ASIC detects gait abnormalities 4.05 times faster than required in 0.325 mm² silicon.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a comprehensive optimization flow spanning software-level bit-width choices and hardware-aware quantization, register-transfer level architecture variants, and physical layout generation can produce an ASIC LSTM accelerator that satisfies the strict real-time, power, and area limits of gait analysis for fall prevention. LSTM networks process sequential sensor data from walking patterns effectively, yet their computational demands usually prevent real-time execution on edge hardware without custom silicon. By trading small accuracy reductions for area savings or prioritizing peak accuracy, the designs reach the necessary speed while fitting in a fraction of a square millimeter. A reader would care because reliable on-device detection could enable continuous monitoring in wearables or implants without cloud offload or excessive battery drain.

Core claim

Through bit-width optimization at the software level with hardware-aware quantization, RTL design exploration, and layout generation, the work produces the first cross-layer co-optimized LSTM accelerator for ASIC-based real-time gait abnormality detection. In 65 nm technology the highest-accuracy layout occupies 0.325 mm² while the area-optimized alternative is 15.4 percent smaller; both run 4.05 times faster than the application requirement.

What carries the argument

Cross-layer co-optimization, which integrates software quantization and bit-width reduction with RTL architecture variants and physical synthesis to balance LSTM gate complexity against detection accuracy and silicon area.

If this is right

Real-time gait monitoring becomes possible inside power- and area-constrained wearable or medical devices.
Latency for step-abnormality decisions falls well below the minimum needed for continuous patient safety applications.
Designers can select between maximum detection accuracy and the smallest possible die area depending on the target product.
The same optimization steps apply directly to other recurrent networks processing time-series medical signals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same flow could support always-on monitoring of additional biomedical time series such as ECG or tremor patterns.
If measured power numbers also stay low, the accelerator enables battery-powered devices that run for days without recharging.
Independent accuracy tests on public gait corpora would confirm whether the quantization choices truly preserve clinical reliability.

Load-bearing premise

Hardware-aware quantization together with the chosen design-space points keep enough numerical precision to maintain reliable gait abnormality detection.

What would settle it

Executing the accelerator on a standard gait dataset and measuring an accuracy drop below the threshold required for safe clinical abnormality detection.

Figures

Figures reproduced from arXiv: 2604.13543 by Alar Kuusik, Jaan Raik, Levent Aksoy, Mohammad Eslami, Mohammad Hasan Ahmadilivani.

**Figure 1.** Figure 1: (a) LSTM NN structure; (b) LSTM cell. • Physical designs of two LSTM accelerators, one with the smallest area and the other with the best accuracy, validated on various disease data and pre-trained LSTM models in software. The physical synthesis results show that the design with the best accuracy has a die size of 0.325 mm2 with a power dissipation of 2.089 mW, while the one with the smallest area has 15.4… view at source ↗

**Figure 2.** Figure 2: Steps in the design of an LSTM NN hardware accelerator. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: LSTM accelerator design in hardware. to parameters and operations of the LSTM NNs. In hardware, the bit-width of parameters affects the memory size, and the bit-width of parameters and operations has an impact on the size of the data path and arithmetic logic. We pinpoint the computations in the software to the corresponding hardware blocks, so the quantization applied at the software level mimics its impa… view at source ↗

**Figure 4.** Figure 4: Exploration of degradation in accuracy (left) and F1-score (right) [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Layouts with routed view of the selected designs: (a) configuration #7; [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Layouts with placement view of the selected designs: (a) configura [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Long Short-Term Memory (LSTM) neural networks have penetrated healthcare applications where real-time requirements and edge computing capabilities are essential. Gait analysis that detects abnormal steps to prevent patients from falling is a prominent problem for such applications. Given the extremely stringent design requirements in performance, power dissipation, and area, an Application-Specific Integrated Circuit (ASIC) enables an efficient real-time exploitation of LSTMs for gait analysis, achieving high accuracy. To the best of our knowledge, this work presents the first cross-layer co-optimized LSTM accelerator for real-time gait analysis, targeting an ASIC design. We conduct a comprehensive design space exploration from software down to layout design. We carry out a bit-width optimization at the software level with hardware-aware quantization to reduce the hardware complexity, explore various designs at the register-transfer level, and generate alternative layouts to find efficient realizations of the LSTM accelerator in terms of hardware complexity and accuracy. The physical synthesis results show that, using the 65 nm technology, the die size of the accelerator's layout optimized for the highest accuracy is 0.325 mm^2, while the alternative design optimized for hardware complexity with a slightly lower accuracy occupies 15.4% smaller area. Moreover, the designed accelerators achieve accurate gait abnormality detection 4.05x faster than the given application requirement.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a full ASIC flow for an LSTM gait detector with concrete 65 nm area and speedup numbers but provides zero accuracy metrics or validation details to support its claims.

read the letter

The main thing to know is that the authors took an LSTM for gait abnormality detection and ran it through bit-width optimization, RTL exploration, and actual layout generation in 65 nm technology. They end up with two designs: one at 0.325 mm² optimized for accuracy and a second that is 15.4% smaller with a minor accuracy trade-off. Both are reported to run 4.05 times faster than the real-time requirement for the application. That end-to-end cross-layer flow is the concrete output here, and they frame it as the first such co-optimized ASIC for this specific use case.

Referee Report

2 major / 1 minor

Summary. The manuscript presents the first cross-layer co-optimized LSTM accelerator for real-time gait analysis on ASIC. It performs design-space exploration starting with hardware-aware bit-width quantization at the software level, followed by RTL design variants and physical layout generation. In 65 nm technology, the accuracy-optimized layout occupies 0.325 mm² while the complexity-optimized variant is 15.4 % smaller; both are stated to deliver accurate gait-abnormality detection 4.05× faster than the application requirement.

Significance. If the quantized LSTM retains the necessary detection accuracy, the work would supply a concrete, end-to-end ASIC realization with quantified area and throughput numbers for a healthcare edge-AI task. The explicit cross-layer flow (quantization → RTL → layout) and the reported 65 nm synthesis results would constitute a useful benchmark for similar constrained LSTM deployments.

major comments (2)

[Abstract] Abstract: The central claim that the accelerators 'achieve accurate gait abnormality detection' is unsupported by any quantitative accuracy figures (precision, recall, F1, or detection rate), dataset description (sensor modality, subject count, normal/abnormal sample counts, train/test split), or ablation comparing floating-point versus quantized model performance. This omission is load-bearing because the paper's value proposition rests on the assertion that hardware-aware quantization and layout choices preserve application-level correctness.
[Abstract] Abstract: The reported 4.05× speedup relative to 'the given application requirement' lacks an explicit statement of the latency or throughput target (e.g., maximum allowable inference latency in ms or minimum samples per second) and of the exact throughput measurement used to compute the factor. Without these definitions, the performance claim cannot be reproduced or compared with other accelerators.

minor comments (1)

The manuscript would benefit from a summary table that contrasts the two presented layouts against each other and against any prior LSTM accelerators on area, power, latency, and accuracy.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and agree that the abstract requires strengthening with quantitative details to better support the central claims. Revisions will be made to the abstract and relevant sections.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the accelerators 'achieve accurate gait abnormality detection' is unsupported by any quantitative accuracy figures (precision, recall, F1, or detection rate), dataset description (sensor modality, subject count, normal/abnormal sample counts, train/test split), or ablation comparing floating-point versus quantized model performance. This omission is load-bearing because the paper's value proposition rests on the assertion that hardware-aware quantization and layout choices preserve application-level correctness.

Authors: We acknowledge that the abstract as currently written does not include these quantitative elements, which weakens the standalone readability of the central claim. The full manuscript does contain the accuracy results, dataset description (including sensor modalities, subject counts, and splits), and floating-point vs. quantized ablation in the experimental evaluation section. To address this, we will revise the abstract to concisely report key metrics (e.g., F1-score or detection accuracy for both model variants), a one-sentence dataset summary, and a note on the negligible accuracy drop post-quantization. This will make the cross-layer co-optimization benefits explicit without altering the manuscript's technical content. revision: yes
Referee: [Abstract] Abstract: The reported 4.05× speedup relative to 'the given application requirement' lacks an explicit statement of the latency or throughput target (e.g., maximum allowable inference latency in ms or minimum samples per second) and of the exact throughput measurement used to compute the factor. Without these definitions, the performance claim cannot be reproduced or compared with other accelerators.

Authors: We agree that the abstract does not explicitly define the application requirement or the measurement basis for the 4.05× factor. The manuscript derives this from the real-time gait analysis constraint (maximum allowable latency per inference for continuous monitoring) and reports post-synthesis throughput in inferences per second. We will revise the abstract to state the exact target (e.g., required samples per second or ms latency bound) and clarify that the factor compares the accelerator's measured throughput against this requirement. Corresponding details will also be added to the results section for reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on standard synthesis flows and design exploration

full rationale

The paper describes a conventional cross-layer flow: hardware-aware bit-width quantization at the software level, RTL design variants, and physical synthesis in 65 nm to obtain area and latency numbers. These outputs are produced by external EDA tools applied to the chosen architectures; they are not obtained by fitting a parameter to a subset of the same data and then relabeling the fit as a prediction, nor by any self-definitional equation, self-citation uniqueness theorem, or ansatz smuggled through prior work. The novelty claim (“first cross-layer co-optimized LSTM accelerator”) is an assertion of priority, not a load-bearing premise that the rest of the derivation depends upon. Consequently the reported 4.05× speed-up and area figures are independent measurements rather than tautological restatements of the inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no identifiable free parameters, axioms, or invented entities; the work relies on standard LSTM models and ASIC design flows from prior literature without new postulates.

pith-pipeline@v0.9.0 · 5547 in / 1227 out tokens · 63769 ms · 2026-05-10T12:33:08.195338+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Application of artificial intelligence in healthcare: chances and challenges,

R. Manne and S. C. Kantheti, “Application of artificial intelligence in healthcare: chances and challenges,”Current Journal of Applied Science and Technology, vol. 40, no. 6, pp. 78–89, 2021

work page 2021
[2]

Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities,

A. Rahman, T. Debnath, D. Kundu, M. S. I. Khan, A. A. Aishi, S. Sazzad, M. Sayduzzaman, and S. S. Band, “Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities,”AIMS Public Health, vol. 11, no. 1, 2024

work page 2024
[3]

From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare,

C. Chakraborty, M. Bhattacharya, S. Pal, and S.-S. Lee, “From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare,”Elsevier Current Research in Biotech- nology, vol. 7, 2024

work page 2024
[4]

A survey of human gait- based artificial intelligence applications,

E. J. Harris, I.-H. Khoo, and E. Demircan, “A survey of human gait- based artificial intelligence applications,”Frontiers in Robotics and AI, vol. 8, 2022

work page 2022
[5]

Gait disorders in adults and the elderly: A clinical guide,

W. Pirker and R. Katzenschlager, “Gait disorders in adults and the elderly: A clinical guide,”Springer Wiener Klinische Wochenschrift, vol. 129, no. 3, pp. 81–95, 2017

work page 2017
[6]

Advances in functional electrical stimulation (FES),

D. B. Popovi ´c, “Advances in functional electrical stimulation (FES),” Journal of Electromyography and Kinesiology, vol. 24, no. 6, pp. 795– 802, 2014

work page 2014
[7]

Deep learning for quantified gait analysis: a systematic literature review,

A. Khan, O. Galarraga, S. Garcia-Salicetti, and V . Vigneron, “Deep learning for quantified gait analysis: a systematic literature review,”IEEE Access, 2024

work page 2024
[8]

Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition,

F. J. Ord ´o˜nez and D. Roggen, “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition,”MDPI Sensors, vol. 16, no. 1, 2016

work page 2016
[9]

Real-time gait anomaly detection using 1d-cnn and lstm,

J. Rostovski, M. H. Ahmadilivani, A. Krivo ˇsei, A. Kuusik, and M. M. Alam, “Real-time gait anomaly detection using 1d-cnn and lstm,” in Nordic Conference on Digital Health and Wireless Solutions, 2024, pp. 260–278

work page 2024
[10]

A review on the long short-term memory model,

G. Van Houdt, C. Mosquera, and G. N ´apoles, “A review on the long short-term memory model,”Springer Artificial Intelligence Review, vol. 53, no. 8, pp. 5929–5955, 2020

work page 2020
[11]

Hardware accelerator design for healthcare applications: Review and perspectives,

J. N. Tripathi, B. Kumar, and D. Junjariya, “Hardware accelerator design for healthcare applications: Review and perspectives,” inIEEE International Symposium on Circuits and Systems (ISCAS), 2022, pp. 1367–1371

work page 2022
[12]

Chipmunk: a systolically scalable 0.9 mm 2, 3.08 gop/s/mw@ 1.2 mw accelerator for near-sensor recurrent neural network inference,

F. Conti, L. Cavigelli, G. Paulin, I. Susmelj, and L. Benini, “Chipmunk: a systolically scalable 0.9 mm 2, 3.08 gop/s/mw@ 1.2 mw accelerator for near-sensor recurrent neural network inference,” inIEEE Custom Integrated Circuits Conference (CICC), 2018, pp. 1–4

work page 2018
[13]

A 3.89-gops/mw scalable recurrent neural network processor with improved efficiency on memory and computation,

J. Wu, F. Li, Z. Chen, and X. Xiang, “A 3.89-gops/mw scalable recurrent neural network processor with improved efficiency on memory and computation,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 12, pp. 2939–2943, 2019

work page 2019
[14]

ELSA: A throughput-optimized design of an lstm accelerator for energy-constrained devices,

E. Azari and S. Vrudhula, “ELSA: A throughput-optimized design of an lstm accelerator for energy-constrained devices,”ACM Transactions on Embedded Computing Systems (TECS), vol. 19, no. 1, pp. 1–21, 2020

work page 2020
[15]

An 8.93 tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition,

D. Kadetotad, S. Yin, V . Berisha, C. Chakrabarti, and J.-s. Seo, “An 8.93 tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition,”IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1877–1887, 2020

work page 2020
[16]

Digit-serial DA-based fixed-point rnns: A unified approach for enhancing architectural efficiency,

M. T. Khan and M. A. Alhartomi, “Digit-serial DA-based fixed-point rnns: A unified approach for enhancing architectural efficiency,”IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024
[17]

The diagnosis of parkinson’s disease based on gait, speech analysis and machine learning techniques,

Y . Miao, X. Lou, and H. Wu, “The diagnosis of parkinson’s disease based on gait, speech analysis and machine learning techniques,” in Proceedings of the 2021 international conference on bioinformatics and intelligent computing, 2021, pp. 358–371

work page 2021
[18]

Rdgait: A mmwave based gait user recognition system for complex indoor environments using single-chip radar,

D. Wang, X. Zhang, K. Wang, L. Wang, X. Fan, and Y . Zhang, “Rdgait: A mmwave based gait user recognition system for complex indoor environments using single-chip radar,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 3, pp. 1–31, 2024

work page 2024
[19]

Analysis and improvement of resilience for long short-term memory neural networks,

M. H. Ahmadilivani, J. Raik, M. Daneshtalab, and A. Kuusik, “Analysis and improvement of resilience for long short-term memory neural networks,” in2023 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 2023, pp. 1–4

work page 2023
[20]

Fpga implementation of a lstm neural network,

J. P. C. Fonseca, “Fpga implementation of a lstm neural network,” Master’s thesis, Universidade do Porto (Portugal), 2016

work page 2016
[21]

An efficient hardware architecture for activation function in deep learning processor,

L. Li, S. Zhang, and J. Wu, “An efficient hardware architecture for activation function in deep learning processor,” in2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), 2018, pp. 911–918

work page 2018
[22]

Dadiannao: A machine-learning super- computer,

Y . Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, “Dadiannao: A machine-learning super- computer,” inProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014, p. 609–622

work page 2014

[1] [1]

Application of artificial intelligence in healthcare: chances and challenges,

R. Manne and S. C. Kantheti, “Application of artificial intelligence in healthcare: chances and challenges,”Current Journal of Applied Science and Technology, vol. 40, no. 6, pp. 78–89, 2021

work page 2021

[2] [2]

Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities,

A. Rahman, T. Debnath, D. Kundu, M. S. I. Khan, A. A. Aishi, S. Sazzad, M. Sayduzzaman, and S. S. Band, “Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities,”AIMS Public Health, vol. 11, no. 1, 2024

work page 2024

[3] [3]

From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare,

C. Chakraborty, M. Bhattacharya, S. Pal, and S.-S. Lee, “From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare,”Elsevier Current Research in Biotech- nology, vol. 7, 2024

work page 2024

[4] [4]

A survey of human gait- based artificial intelligence applications,

E. J. Harris, I.-H. Khoo, and E. Demircan, “A survey of human gait- based artificial intelligence applications,”Frontiers in Robotics and AI, vol. 8, 2022

work page 2022

[5] [5]

Gait disorders in adults and the elderly: A clinical guide,

W. Pirker and R. Katzenschlager, “Gait disorders in adults and the elderly: A clinical guide,”Springer Wiener Klinische Wochenschrift, vol. 129, no. 3, pp. 81–95, 2017

work page 2017

[6] [6]

Advances in functional electrical stimulation (FES),

D. B. Popovi ´c, “Advances in functional electrical stimulation (FES),” Journal of Electromyography and Kinesiology, vol. 24, no. 6, pp. 795– 802, 2014

work page 2014

[7] [7]

Deep learning for quantified gait analysis: a systematic literature review,

A. Khan, O. Galarraga, S. Garcia-Salicetti, and V . Vigneron, “Deep learning for quantified gait analysis: a systematic literature review,”IEEE Access, 2024

work page 2024

[8] [8]

Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition,

F. J. Ord ´o˜nez and D. Roggen, “Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition,”MDPI Sensors, vol. 16, no. 1, 2016

work page 2016

[9] [9]

Real-time gait anomaly detection using 1d-cnn and lstm,

J. Rostovski, M. H. Ahmadilivani, A. Krivo ˇsei, A. Kuusik, and M. M. Alam, “Real-time gait anomaly detection using 1d-cnn and lstm,” in Nordic Conference on Digital Health and Wireless Solutions, 2024, pp. 260–278

work page 2024

[10] [10]

A review on the long short-term memory model,

G. Van Houdt, C. Mosquera, and G. N ´apoles, “A review on the long short-term memory model,”Springer Artificial Intelligence Review, vol. 53, no. 8, pp. 5929–5955, 2020

work page 2020

[11] [11]

Hardware accelerator design for healthcare applications: Review and perspectives,

J. N. Tripathi, B. Kumar, and D. Junjariya, “Hardware accelerator design for healthcare applications: Review and perspectives,” inIEEE International Symposium on Circuits and Systems (ISCAS), 2022, pp. 1367–1371

work page 2022

[12] [12]

Chipmunk: a systolically scalable 0.9 mm 2, 3.08 gop/s/mw@ 1.2 mw accelerator for near-sensor recurrent neural network inference,

F. Conti, L. Cavigelli, G. Paulin, I. Susmelj, and L. Benini, “Chipmunk: a systolically scalable 0.9 mm 2, 3.08 gop/s/mw@ 1.2 mw accelerator for near-sensor recurrent neural network inference,” inIEEE Custom Integrated Circuits Conference (CICC), 2018, pp. 1–4

work page 2018

[13] [13]

A 3.89-gops/mw scalable recurrent neural network processor with improved efficiency on memory and computation,

J. Wu, F. Li, Z. Chen, and X. Xiang, “A 3.89-gops/mw scalable recurrent neural network processor with improved efficiency on memory and computation,”IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 27, no. 12, pp. 2939–2943, 2019

work page 2019

[14] [14]

ELSA: A throughput-optimized design of an lstm accelerator for energy-constrained devices,

E. Azari and S. Vrudhula, “ELSA: A throughput-optimized design of an lstm accelerator for energy-constrained devices,”ACM Transactions on Embedded Computing Systems (TECS), vol. 19, no. 1, pp. 1–21, 2020

work page 2020

[15] [15]

An 8.93 tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition,

D. Kadetotad, S. Yin, V . Berisha, C. Chakrabarti, and J.-s. Seo, “An 8.93 tops/w lstm recurrent neural network accelerator featuring hierarchical coarse-grain sparsity for on-device speech recognition,”IEEE Journal of Solid-State Circuits, vol. 55, no. 7, pp. 1877–1887, 2020

work page 2020

[16] [16]

Digit-serial DA-based fixed-point rnns: A unified approach for enhancing architectural efficiency,

M. T. Khan and M. A. Alhartomi, “Digit-serial DA-based fixed-point rnns: A unified approach for enhancing architectural efficiency,”IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024

[17] [17]

The diagnosis of parkinson’s disease based on gait, speech analysis and machine learning techniques,

Y . Miao, X. Lou, and H. Wu, “The diagnosis of parkinson’s disease based on gait, speech analysis and machine learning techniques,” in Proceedings of the 2021 international conference on bioinformatics and intelligent computing, 2021, pp. 358–371

work page 2021

[18] [18]

Rdgait: A mmwave based gait user recognition system for complex indoor environments using single-chip radar,

D. Wang, X. Zhang, K. Wang, L. Wang, X. Fan, and Y . Zhang, “Rdgait: A mmwave based gait user recognition system for complex indoor environments using single-chip radar,”Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 3, pp. 1–31, 2024

work page 2024

[19] [19]

Analysis and improvement of resilience for long short-term memory neural networks,

M. H. Ahmadilivani, J. Raik, M. Daneshtalab, and A. Kuusik, “Analysis and improvement of resilience for long short-term memory neural networks,” in2023 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFT). IEEE, 2023, pp. 1–4

work page 2023

[20] [20]

Fpga implementation of a lstm neural network,

J. P. C. Fonseca, “Fpga implementation of a lstm neural network,” Master’s thesis, Universidade do Porto (Portugal), 2016

work page 2016

[21] [21]

An efficient hardware architecture for activation function in deep learning processor,

L. Li, S. Zhang, and J. Wu, “An efficient hardware architecture for activation function in deep learning processor,” in2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), 2018, pp. 911–918

work page 2018

[22] [22]

Dadiannao: A machine-learning super- computer,

Y . Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam, “Dadiannao: A machine-learning super- computer,” inProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, 2014, p. 609–622

work page 2014