UniSpike: Accelerating Spiking Neural Networks on Neuromorphic Systems via Eliminating Address Redundancy
Pith reviewed 2026-05-25 02:19 UTC · model grok-4.3
The pith
UniSpike aggregates spikes to the same core into compact packets to remove repeated destination address transmissions in neuromorphic SNN communication.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UniSpike eliminates address redundancy by aggregating spikes destined for the same core into compact packets. It achieves this via destination-centric spike scheduling, lightweight runtime packet assembly hardware, and destination-aware SNN partitioning. Across diverse workloads the design reduces traffic by 1.93 times on average, delivers 1.77 times speedup, and improves energy efficiency by 1.50 times relative to prior designs.
What carries the argument
Destination-centric spike scheduling combined with lightweight runtime packet assembly hardware and destination-aware SNN partitioning that bundles spikes to identical cores.
If this is right
- Spike traffic volume falls by 1.93 times on average across workloads.
- Execution speed increases by 1.77 times compared with state-of-the-art neuromorphic designs.
- Energy efficiency improves by 1.50 times.
- The gains hold for a range of spiking neural network workloads without model changes.
Where Pith is reading between the lines
- The same aggregation idea could apply to any many-core system where small messages repeatedly target the same destinations.
- Destination-aware partitioning may become a standard step in mapping tools for future neuromorphic chips.
- If the overhead remains low at larger scales, the method could extend to systems with thousands of cores.
Load-bearing premise
The workloads contain high address redundancy and the added scheduling, hardware, and partitioning impose negligible overhead while preserving accuracy.
What would settle it
A workload measurement showing address redundancy below 10 percent of traffic or a hardware prototype where packet assembly overhead exceeds the traffic savings would falsify the claimed gains.
Figures
read the original abstract
Many-core neuromorphic systems accelerate Spiking Neural Networks (SNNs), yet their packet-based spike communication can spend substantial traffic and energy repeatedly transmitting destination addresses. This overhead is amplified by the small payload of spike packets: in representative workloads, duplicate address transmissions account for up to 49% of the total traffic. This paper presents UniSpike, a hardware-software co-design that removes address redundancy by aggregating spikes destined for the same core into compact packets. UniSpike combines destination-centric spike scheduling, lightweight runtime packet assembly hardware, and destination-aware SNN partitioning. Across diverse SNN workloads, UniSpike reduces traffic by 1.93$\times$ on average, delivering 1.77$\times$ speedup and 1.50$\times$ energy efficiency improvement over state-of-the-art designs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes UniSpike, a hardware-software co-design for many-core neuromorphic systems running Spiking Neural Networks (SNNs). It identifies address redundancy in packet-based spike communication (up to 49% of traffic in representative workloads) and eliminates it via three mechanisms: destination-centric spike scheduling, lightweight runtime packet assembly hardware, and destination-aware SNN partitioning. The central empirical claim is that these changes reduce traffic by 1.93× on average, yielding 1.77× speedup and 1.50× energy-efficiency gains over state-of-the-art designs across diverse SNN workloads.
Significance. If the reported gains prove robust once overheads are quantified, UniSpike would address a practical bottleneck in neuromorphic accelerators by lowering communication volume without altering SNN semantics. The work supplies concrete workload measurements of redundancy and direct comparisons against external baselines, which are useful for the neuromorphic hardware community even if the absolute numbers require further validation.
major comments (3)
- [§5 and §4] §5 (Evaluation) and the hardware description in §4: the claims of 1.93× traffic reduction, 1.77× speedup, and 1.50× energy efficiency rest on the assumption that destination-centric scheduling, packet-assembly logic, and partitioning add negligible area, power, and cycle overhead; no post-synthesis area/power figures, dynamic power measurements, or cycle-accurate overhead data for the assembly unit are supplied.
- [§3.3 and §5] §3.3 (destination-aware partitioning) and §5: no accuracy or functional-equivalence results are reported comparing the original SNN mapping against the partitioned version on the evaluated workloads, leaving open whether partitioning preserves correctness or introduces any accuracy degradation.
- [§5] §5 (workload and baseline description): the performance numbers are presented without explicit characterization of the SNN workloads (layer sizes, spike rates, network topology) or detailed configurations of the state-of-the-art baselines, preventing independent assessment of whether the 49% redundancy figure and the reported speedups are workload-specific or general.
minor comments (2)
- [§5] Figure captions and table headers in §5 could more explicitly list the exact workload names and the precise SOTA designs being compared.
- [Abstract and §4] The abstract states the hardware is “lightweight,” but the body should replace this qualitative term with a quantitative bound once synthesis data are added.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate additional data and details where appropriate.
read point-by-point responses
-
Referee: [§5 and §4] §5 (Evaluation) and the hardware description in §4: the claims of 1.93× traffic reduction, 1.77× speedup, and 1.50× energy efficiency rest on the assumption that destination-centric scheduling, packet-assembly logic, and partitioning add negligible area, power, and cycle overhead; no post-synthesis area/power figures, dynamic power measurements, or cycle-accurate overhead data for the assembly unit are supplied.
Authors: We agree that explicit quantification of overheads would strengthen the claims. The assembly hardware uses minimal logic (small buffers and comparators) designed to operate in parallel without adding cycles. However, the manuscript currently lacks post-synthesis numbers. We will add a new subsection in §4 with synthesis results for area, power, and timing of the assembly unit, confirming overheads below 4% of a core. This addresses the concern directly. revision: yes
-
Referee: [§3.3 and §5] §3.3 (destination-aware partitioning) and §5: no accuracy or functional-equivalence results are reported comparing the original SNN mapping against the partitioned version on the evaluated workloads, leaving open whether partitioning preserves correctness or introduces any accuracy degradation.
Authors: Partitioning reassigns neurons to cores solely to maximize destination grouping while leaving all weights, thresholds, connectivity, and spike semantics unchanged; correctness is preserved by construction. To make this explicit, we will add a table in the revised §5 reporting classification accuracy (or equivalent metric) for each workload before and after partitioning, confirming zero degradation. revision: yes
-
Referee: [§5] §5 (workload and baseline description): the performance numbers are presented without explicit characterization of the SNN workloads (layer sizes, spike rates, network topology) or detailed configurations of the state-of-the-art baselines, preventing independent assessment of whether the 49% redundancy figure and the reported speedups are workload-specific or general.
Authors: We will expand §5 with a table detailing each workload's layer sizes, average spike rates, topologies, and input datasets. Baseline configurations will be summarized with references to the exact parameters from the cited works. This will enable independent verification and clarify the scope of the results. revision: yes
Circularity Check
No circularity: empirical measurements against external baselines
full rationale
The paper describes a hardware-software co-design (destination-centric scheduling, packet assembly, destination-aware partitioning) whose benefits are reported as measured speedups, traffic reductions, and energy gains on representative SNN workloads versus prior published designs. No equations, fitted parameters presented as predictions, self-definitional quantities, or load-bearing self-citations appear in the provided text. All quantitative claims are framed as direct empirical comparisons to external state-of-the-art systems rather than derivations that reduce to the paper's own inputs by construction. The design assumptions (lightweight overhead, preserved accuracy) are engineering claims subject to external validation, not circular reductions.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Malyaban Bal and Abhronil Sengupta. 2024. Spikingbert: Distilling bert to train spiking language models using implicit differentiation. InProc. AAAI Conf. Artif. Intell. (AAAI), Vol. 38. 10998–11006
2024
-
[2]
Srikant Bharadwaj, Jieming Yin, Bradford Beckmann, and Tushar Krishna. 2020. Kite: A Family of Heterogeneous Interposer Topologies Enabled via Accurate Interconnect Modeling. InProc. 57th ACM/IEEE Design Autom. Conf. (DAC). 1–6. doi:10.1109/DAC18072.2020.9218539
-
[3]
Romain Brette and Wulfram Gerstner. 2005. Adaptive exponential integrate-and- fire model as an effective description of neuronal activity.J. Neurophysiol.94, 5 (2005), 3637–3642
2005
-
[4]
Nicolas Brunel. 2000. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons.J. Comput. Neurosci.8 (2000), 183–208
2000
-
[5]
Anthony N Burkitt. 2006. A review of the integrate-and-fire neuron model: I. Homogeneous synaptic input.Biol. Cybern.95 (2006), 1–19
2006
-
[6]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2016. Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks.IEEE J. Solid-State Circuits (JSSC)52, 1 (2016), 127–138
2016
-
[7]
Sayeed Shafayet Chowdhury, Deepika Sharma, Adarsh Kosta, and Kaushik Roy
-
[8]
Eng.4, 1 (2025), 152
Neuromorphic computing for robotic vision: algorithms to hardware advances.Commun. Eng.4, 1 (2025), 152
2025
-
[9]
Reetuparna Das, Asit K Mishra, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Ravishankar Iyer, Mazin S Yousif, and Chita R Das
-
[10]
Performance and power optimization through data compression in network- on-chip architectures. InProc. 14th IEEE Intl. Symp. High Perform. Comput. Arch. (HPCA). 215–225
-
[11]
Mike Davies, Narayan Srinivasa, Tsung-Han Lin, Gautham Chinya, Yongqiang Cao, Sri Harsha Choday, Georgios Dimou, Prasad Joshi, Nabil Imam, Shweta Jain, et al. 2018. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro38, 1 (2018), 82–99
2018
-
[12]
Mike Davies, Andreas Wild, Garrick Orchard, Yulia Sandamirskaya, Gabriel A Fonseca Guerra, Prasad Joshi, Philipp Plank, and Sumedh R Risbud. 2021. Advancing neuromorphic computing with loihi: A survey of results and outlook. Proc. IEEE109, 5 (2021), 911–934
2021
-
[13]
Dipika Deb, MK Rohith, and John Jose. 2021. Flitzip: Effective packet compression for noc in multiprocessor system-on-chip.IEEE Trans. Parallel Distrib. Syst.33, 1 (2021), 117–128
2021
-
[14]
Michael V DeBole, Brian Taba, Arnon Amir, Filipp Akopyan, Alexander An- dreopoulos, William P Risk, Jeff Kusnitz, Carlos Ortega Otero, Tapan K Nayak, Rathinakumar Appuswamy, et al. 2019. TrueNorth: Accelerating from zero to 64 million neurons in 10 years.IEEE Comput.52, 5 (2019), 20–29
2019
-
[15]
Lei Deng, Guanrui Wang, Guoqi Li, Shuangchen Li, Ling Liang, Maohua Zhu, Yujie Wu, Zheyu Yang, Zhe Zou, Jing Pei, et al . 2020. Tianjic: A unified and scalable chip bridging spike-based and continuous neural computation.IEEE J. Solid-State Circuits (JSSC)55, 8 (2020), 2228–2246. UniSpike: Accelerating Spiking Neural Networks on Neuromorphic Systems via El...
2020
-
[16]
Masoumeh Ebrahimi, Masoud Daneshtalab, Pasi Liljeberg, Juha Plosila, José Flich, and Hannu Tenhunen. 2014. Path-Based Partitioning Methods for 3D Networks-on-Chip with Minimal Adaptive Routing.IEEE Trans. Comput.63, 3 (2014), 718–733. doi:10.1109/TC.2012.255
-
[17]
Masoumeh Ebrahimi, Masoud Daneshtalab, Mohammad Hossein Neishaburi, Siamak Mohammadi, Ali Afzali-Kusha, Juha Plosila, and Hannu Tenhunen. 2009. An efficent dynamic multicast routing protocol for distributing traffic in NOCs. InProc. Design, Autom. Test Eur. Conf. Exhib. (DATE). 1064–1069
2009
-
[18]
Francesco Galluppi, Sergio Davies, Alexander Rast, Thomas Sharp, Luis A Plana, and Steve Furber. 2012. A hierachical configuration system for a massively parallel neural hardware platform. InProc. 9th Conf. Comput. Frontiers (CF). 183–192
2012
-
[19]
SAMANWOY GHOSH-DASTIDAR and HOJJAT ADELI. 2009. SPIKING NEU- RAL NETWORKS.Intl. J. Neural Syst.19, 04 (2009), 295–308. doi:10.1142/ S0129065709002002
2009
-
[20]
Bing Han and Kaushik Roy. 2020. Deep spiking neural network: Energy efficiency through time based coding. InProc. Eur. Conf. Comput. Vis. (ECCV). 388–404
2020
-
[21]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. InProc. IEEE Conf. Comput. Vis. Pattern Recogn. (CVPR). 770–778
2016
-
[22]
David Hilbert. 1935. Über die stetige Abbildung einer Linie auf ein Flächenstück. InDritter Band: Analysis...Springer, 1–2
1935
-
[23]
Sebastian Höppner, Yexin Yan, Andreas Dixius, Stefan Scholze, Johannes Partzsch, Marco Stolba, Florian Kelber, Bernhard Vogginger, Felix Neumärker, Georg Ellguth, et al. 2021. The SpiNNaker 2 processing element architecture for hybrid digital neuromorphic computing.arXiv(2021). arXiv:2103.08392
-
[24]
Eugene M Izhikevich. 2003. Simple model of spiking neurons.IEEE Trans. Neural Netw.14, 6 (2003), 1569–1572
2003
-
[25]
Xiaoyue Ji, Zhekang Dong, Guangdong Zhou, Chun Sing Lai, and Donglian Qi
-
[26]
Syst., Man, Cybern.: Syst.54, 8 (2024), 5137–5149
MLG-NCS: Multimodal local–global neuromorphic computing system for affective video content analysis.IEEE Trans. Syst., Man, Cybern.: Syst.54, 8 (2024), 5137–5149
2024
-
[27]
Yu Ji, YouHui Zhang, ShuangChen Li, Ping Chi, CiHang Jiang, Peng Qu, Yuan Xie, and WenGuang Chen. 2016. NEUTRAMS: Neural network transformation and co-design under neuromorphic hardware constraints. InProc. 49th Annu. IEEE/ACM Intl. Symp. Microarch. (MICRO). 1–13
2016
-
[28]
Yuho Jin, Ki Hwan Yum, and Eun Jung Kim. 2008. Adaptive data compression for high-performance low-power on-chip networks. InProc. 41st IEEE/ACM Intl. Symp. Microarch. (MICRO). 354–363
2008
-
[29]
Zi-Yang Kang, Shi-Ming Li, Shi-Ying Wang, Lian-Hua Qu, Rui Gong, Wei Shi, Wei-Xia Xu, and Lei Wang. 2023. Path-based multicast routing for network- on-chip of the neuromorphic processor.J. Comput. Sci. Technol.38, 5 (2023), 1098–1112
2023
-
[30]
2009.Learning multiple layers of features from tiny images
Alex Krizhevsky and Geoffrey Hinton. 2009.Learning multiple layers of features from tiny images. Technical Report. University of Toronto
2009
-
[31]
Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding reuse, performance, and hardware cost of dnn dataflow: A data-centric approach. InProc. 52nd Annu. IEEE/ACM Intl. Symp. Microarch. (MICRO). 754–768
2019
-
[32]
Hunjun Lee, Chanmyeong Kim, Minseop Kim, Yujin Chung, and Jangwoo Kim
-
[33]
Neurosync: A scalable and accurate brain simulator using safe and efficient speculation. InProc. 2022 IEEE Intl. Symp. High-Perform. Comput. Arch. (HPCA). 633–647
2022
-
[34]
Jeong-Jun Lee, Wenrui Zhang, and Peng Li. 2022. Parallel time batching: Systolic- array acceleration of sparse spiking neural computation. InProc. 2022 IEEE Intl. Symp. High-Perform. Comput. Arch. (HPCA). 317–330
2022
-
[35]
Hongmin Li, Hanchao Liu, Xiangyang Ji, Guoqi Li, and Luping Shi. 2017. Cifar10- dvs: an event-stream dataset for object classification.Front. Neurosci.11 (2017), 309
2017
-
[36]
Chit-Kwan Lin, Andreas Wild, Gautham N Chinya, Tsung-Han Lin, Mike Davies, and Hong Wang. 2018. Mapping spiking neural networks onto a manycore neuromorphic architecture.ACM SIGPLAN Notices53, 4 (2018), 78–89
2018
-
[37]
Xiaola Lin and Lionel M Ni. 2002. Multicast communication in multicomputer networks.IEEE Trans. Parallel Distrib. Syst.4, 10 (2002), 1105–1117
2002
- [38]
-
[39]
De Ma, Xiaofei Jin, Shichun Sun, Yitao Li, Xundong Wu, Youneng Hu, Fangchao Yang, Huajin Tang, Xiaolei Zhu, Peng Lin, et al. 2024. Darwin3: a large-scale neuromorphic chip with a novel ISA and on-chip learning.Natl. Sci. Rev.11, 5 (2024), nwae102
2024
-
[40]
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. InProc. ACL. 115–124
2005
-
[41]
Xiaochen Peng, Shanshi Huang, Hongwu Jiang, Anni Lu, and Shimeng Yu. 2020. DNN+ NeuroSim V2. 0: An end-to-end benchmarking framework for compute- in-memory accelerators for on-chip training.IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. (TCAD)40, 11 (2020), 2306–2319
2020
-
[42]
Bodo Rueckauer, Iulia-Alexandra Lungu, Yuhuang Hu, Michael Pfeiffer, and Shih-Chii Liu. 2017. Conversion of continuous-valued deep networks to efficient event-driven networks for image classification.Front. Neurosci.11 (2017), 682
2017
-
[43]
Catherine D Schuman, Thomas E Potok, Robert M Patton, J Douglas Birdwell, Mark E Dean, Garrett S Rose, and James S Plank. 2017. A survey of neuromorphic computing and neural networks in hardware.arXiv(2017). arXiv:1705.06963
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[44]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional net- works for large-scale image recognition.arXiv(2014). arXiv:1409.1556
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[45]
Manning, Andrew Ng, and Christopher Potts
Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. 2013. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. InProc. 2013 Conf. Empirical Methods Natural Lang. Process. (EMNLP). 1631–1642
2013
-
[46]
Marcel Stimberg, Romain Brette, and Dan FM Goodman. 2019. Brian 2, an intuitive and efficient neural simulator.elife8 (2019), e47314
2019
-
[47]
Tim P Vogels, Henning Sprekeler, Friedemann Zenke, Claudia Clopath, and Wulfram Gerstner. 2011. Inhibitory plasticity balances excitation and inhibition in sensory pathways and memory networks.Science334, 6062 (2011), 1569–1573
2011
-
[48]
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding.arXiv(2018). arXiv:1804.07461
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[49]
Adina Williams, Nikita Nangia, and Samuel Bowman. 2018. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference. InProc. 2018 Conf. NAACL: HLT, Vol. 1. 1112–1122. http://aclweb.org/anthology/N18-1101
2018
-
[50]
Man Yao, Jiakui Hu, Zhaokun Zhou, Li Yuan, Yonghong Tian, Bo Xu, and Guoqi Li. 2023. Spike-driven transformer.Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) 36 (2023), 64043–64058
2023
- [51]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.