hub

CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs

· 2018 · cs.NE · arXiv 1801.06601

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

open full Pith review browse 11 citing papers arXiv PDF

abstract

Deep Neural Networks are becoming increasingly popular in always-on IoT edge devices performing data analytics right at the source, reducing latency as well as energy consumption for data communication. This paper presents CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Neural network inference based on CMSIS-NN kernels achieves 4.6X improvement in runtime/throughput and 4.9X improvement in energy efficiency.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 baseline 1

citation-polarity summary

background 2 baseline 1

representative citing papers

SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring

cs.SD · 2026-05-20 · accept · novelty 7.0

SEABAD is a publicly released, balanced dataset of 50,000 curated 16 kHz audio clips spanning 1,677 tropical bird species, with a dual-branch curation pipeline and MobileNetV3-Small baseline reaching 99.57% accuracy.

Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation

eess.SP · 2026-05-18 · unverdicted · novelty 7.0

NDR-SHKF replaces the static forgetting factor in Sage-Husa Kalman Filters with a learned vector-valued memory attenuation policy from a bifurcated recurrent network trained end-to-end on whitened innovations to minimize estimation error.

Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices

cs.LG · 2026-05-15 · conditional · novelty 7.0

CATS enables collaborative transformer inference on up to 16 ultra-low-power wireless devices, supporting models up to 14 times larger than a single device can run via SomeGather pruning and message-dropout robustness.

Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection

cs.LG · 2026-05-04 · unverdicted · novelty 7.0

MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and suppressed identity leakage.

AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators

cs.DC · 2026-02-15 · unverdicted · novelty 6.0

AEG baremetal framework achieves 9.2x higher compute efficiency, 3-7x less data movement, and near-zero latency variance for ResNet-18 on 28 AIE tiles versus Linux Vitis AI on 304 tiles while maintaining 68.78% ImageNet accuracy.

Federated Learning with Non-IID Data

cs.LG · 2018-06-02 · conditional · novelty 6.0

Non-IID data causes up to 55% accuracy loss in federated learning due to weight divergence measured by earth mover's distance; 5% globally shared data recovers 30% accuracy on CIFAR-10.

Split CNN Inference on Networked Microcontrollers

cs.DC · 2026-05-10 · unverdicted · novelty 6.0

A fine-grained split inference system enables CNN models infeasible on single MCUs to run across networked devices by partitioning at sub-layer granularity, reducing per-device peak RAM while keeping practical latency.

EdgeSpike: Spiking Neural Networks for Low-Power Autonomous Sensing in Edge IoT Architectures

cs.NE · 2026-04-29 · unverdicted · novelty 6.0

EdgeSpike delivers 91.4% mean accuracy on five sensing tasks with 31x lower energy on neuromorphic hardware and 6.3x longer battery life in a seven-month field deployment compared to conventional CNNs.

Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

cs.AR · 2026-04-17 · unverdicted · novelty 6.0

A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.

Neuromorphic Parameter Estimation for Power Converter Health Monitoring Using Spiking Neural Networks

cs.NE · 2026-04-17 · unverdicted · novelty 6.0

A three-layer leaky integrate-and-fire spiking neural network estimates passive component parameters in power converters, cutting resistance error from 25.8% to 10.2% versus feedforward baselines at projected 270x lower energy on neuromorphic chips.

MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices

cs.CV · 2026-02-06 · conditional · novelty 4.0 · 2 refs

MicroBi-ConvLSTM is a convolutional-recurrent model with 11.4K parameters that delivers competitive accuracy on eight HAR benchmarks and full INT8 deployment coverage on Raspberry Pi Pico 2 and ESP32.

citing papers explorer

Showing 11 of 11 citing papers.

SEABAD: A Tropical Bird Activity Detection Dataset for Passive Acoustic Monitoring cs.SD · 2026-05-20 · accept · none · ref 78 · internal anchor
SEABAD is a publicly released, balanced dataset of 50,000 curated 16 kHz audio clips spanning 1,677 tropical bird species, with a dual-branch curation pipeline and MobileNetV3-Small baseline reaching 99.57% accuracy.
Learned Memory Attenuation in Sage-Husa Kalman Filters for Robust UAV State Estimation eess.SP · 2026-05-18 · unverdicted · none · ref 62 · internal anchor
NDR-SHKF replaces the static forgetting factor in Sage-Husa Kalman Filters with a learned vector-valued memory attenuation policy from a bifurcated recurrent network trained end-to-end on whitened innovations to minimize estimation error.
Going Beyond the Edge: Distributed Inference of Transformer Models on Ultra-Low-Power Wireless Devices cs.LG · 2026-05-15 · conditional · none · ref 24 · internal anchor
CATS enables collaborative transformer inference on up to 16 ultra-low-power wireless devices, supporting models up to 14 times larger than a single device can run via SomeGather pruning and message-dropout robustness.
Mixed-Precision Information Bottlenecks for On-Device Trait-State Disentanglement in Bipolar Agitation Detection cs.LG · 2026-05-04 · unverdicted · none · ref 88
MP-IB uses an 8x information asymmetry via FP16 trait heads and INT4 state heads to disentangle speaker identity from agitation in voice biomarkers, outperforming larger models on edge devices with low latency and suppressed identity leakage.
AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators cs.DC · 2026-02-15 · unverdicted · none · ref 20 · internal anchor
AEG baremetal framework achieves 9.2x higher compute efficiency, 3-7x less data movement, and near-zero latency variance for ResNet-18 on 28 AIE tiles versus Linux Vitis AI on 304 tiles while maintaining 68.78% ImageNet accuracy.
Federated Learning with Non-IID Data cs.LG · 2018-06-02 · conditional · none · ref 2 · internal anchor
Non-IID data causes up to 55% accuracy loss in federated learning due to weight divergence measured by earth mover's distance; 5% globally shared data recovers 30% accuracy on CIFAR-10.
Split CNN Inference on Networked Microcontrollers cs.DC · 2026-05-10 · unverdicted · none · ref 27
A fine-grained split inference system enables CNN models infeasible on single MCUs to run across networked devices by partitioning at sub-layer granularity, reducing per-device peak RAM while keeping practical latency.
EdgeSpike: Spiking Neural Networks for Low-Power Autonomous Sensing in Edge IoT Architectures cs.NE · 2026-04-29 · unverdicted · none · ref 45
EdgeSpike delivers 91.4% mean accuracy on five sensing tasks with 31x lower energy on neuromorphic hardware and 6.3x longer battery life in a seven-month field deployment compared to conventional CNNs.
Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition cs.AR · 2026-04-17 · unverdicted · none · ref 7
A co-design framework using approximate matrix decomposition and genetic algorithms delivers 33% average latency reduction in TinyML CNN FPGA accelerators with 1.3% average accuracy loss versus standard systolic arrays.
Neuromorphic Parameter Estimation for Power Converter Health Monitoring Using Spiking Neural Networks cs.NE · 2026-04-17 · unverdicted · none · ref 11
A three-layer leaky integrate-and-fire spiking neural network estimates passive component parameters in power converters, cutting resistance error from 25.8% to 10.2% versus feedforward baselines at projected 270x lower energy on neuromorphic chips.
MicroBi-ConvLSTM: An Ultra-Lightweight Efficient Model for Human Activity Recognition on Resource Constrained Devices cs.CV · 2026-02-06 · conditional · none · ref 32 · 2 links · internal anchor
MicroBi-ConvLSTM is a convolutional-recurrent model with 11.4K parameters that delivers competitive accuracy on eight HAR benchmarks and full INT8 deployment coverage on Raspberry Pi Pico 2 and ESP32.

CMSIS-NN: Efficient neural network kernels for ARM Cortex-M CPUs

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer