pith. sign in

arxiv: 2604.05313 · v1 · submitted 2026-04-07 · 📡 eess.SY · cs.SY

An Ultra-Low-Power Synthesizable Asynchronous AER Encoder for Neuromorphic Edge Devices

Pith reviewed 2026-05-10 20:06 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords AERasynchronous encoderneuromorphicsynthesizablemicropipelinebundled-datalow-poweredge devices
0
0 comments X

The pith

A fully synthesizable tree-based AER encoder achieves 33 MEvent/s throughput and 435 fJ per event using only standard digital cells.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a fully synthesizable asynchronous AER encoder for neuromorphic edge devices by employing a tree-based architecture with semi-decoupled micropipeline and bundled-data protocol. This replaces custom latches with standard flip-flops and embeds a random-priority arbiter, allowing complete digital synthesis and place-and-route. A prototype fabricated in 65 nm CMOS validates the approach with measured performance of 33 MEvent/s peak throughput, 50 ns average event latency, and 435 fJ energy per encoded event. This matters for enabling scalable, low-power event encoding in commercial digital processes without specialized design flows.

Core claim

The central claim is that an asynchronous AER encoder can be realized as a purely digital, standard-cell design using a semi-decoupled micropipeline with bundled-data protocol and an embedded cross-coupled NAND arbiter, achieving high throughput and ultra-low power as proven by silicon measurements in 65 nm technology.

What carries the argument

Semi-decoupled micropipeline with bundled-data protocol and embedded random-priority arbiter that allows replacement of latches with flip-flops for synthesizability while resolving event collisions.

Load-bearing premise

That the asynchronous micropipeline and arbiter circuit will operate correctly and without errors when extended to larger numbers of events beyond the tested 8-event prototype.

What would settle it

A fabricated larger-scale version of the encoder showing arbitration conflicts, timing violations, or energy per event significantly above 435 fJ would disprove the claims of reliable scalability and ultra-low power.

Figures

Figures reproduced from arXiv: 2604.05313 by Sahil Shah, Sheng-Yu Peng, Yihui Wang.

Figure 1
Figure 1. Figure 1: The Address-Event Representation (AER) protocol. The AER system [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The proposed asynchronous bundled-data micropipeline. (a) Datapath [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Earle latch-based C-elements. (b) Modified 4-phase Semi [PITH_FULL_IMAGE:figures/full_fig_p002_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Multiplexed Arbitration circuit. The inherent metastability of the cross [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The stochastic arbiters were implemented exclusively [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗
Figure 6
Figure 6. Figure 6: Die micrograph and layout details of the fabricated AER encoder in [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
read the original abstract

This paper presents a fully synthesizable, treebased Address-Event Representation (AER) encoder designed for scalable neuromorphic computing systems. To achieve high throughput while maintaining strict compatibility with commercial EDA workflows, the asynchronous design employs a bundled-data protocol within a semi-decoupled micropipeline. The architecture replaces traditional transparent latches with standard edge-triggered flip-flops, enabling digital synthesis and place-and-route (PnR) using Cadence toolkits. A cross-coupled NAND-based random-priority arbiter is embedded within the encoder of each tree node to resolve event collisions efficiently. An 8-event AER prototype is fabricated in 65 nm CMOS technology utilizing a purely digital standard-cell flow. Post-fabrication silicon measurements validate the design, demonstrating a peak throughput of 33 MEvent/s and an average event latency of 50 ns, equating to a propagation delay of 17 ns/(event-bit). The design consumes only 435 fJ per encoded event.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. This paper presents a fully synthesizable tree-based asynchronous AER encoder for neuromorphic edge devices. It uses a bundled-data protocol in a semi-decoupled micropipeline, replaces transparent latches with edge-triggered flip-flops for standard-cell EDA compatibility, and embeds cross-coupled NAND random-priority arbiters at each tree node. An 8-event prototype fabricated in 65 nm CMOS is validated by post-silicon measurements showing 33 MEvent/s peak throughput, 50 ns average latency (17 ns per event-bit), and 435 fJ per encoded event.

Significance. If the central claims hold, the work offers a practical, ultra-low-power synthesizable AER solution that could ease integration of neuromorphic interfaces into digital edge systems. The silicon measurements provide direct empirical support for the prototype's throughput and energy figures, which is a strength for an engineering design paper. The standard-cell flow and avoidance of custom latches are notable for reproducibility and adoption.

major comments (2)
  1. [Abstract] Abstract: The positioning of the design as suitable for 'scalable neuromorphic computing systems' is load-bearing but unsupported beyond the 8-event prototype. No simulation results, timing analysis, or validation of the semi-decoupled micropipeline and embedded arbiter are provided for deeper trees or higher event densities, where accumulated forward-path delay mismatch could violate the bundled-data assumption or introduce unexercised metastability.
  2. [Prototype validation] Prototype results (as reported in the abstract and validation section): The measured peak throughput of 33 MEvent/s and energy of 435 fJ/event lack any baseline comparison to prior AER encoders or details on test conditions (e.g., event injection rate, temperature, or number of trials), which weakens assessment of the claimed performance margins.
minor comments (2)
  1. [Abstract] Abstract: The reported figures (50 ns latency, 17 ns/(event-bit) delay) would benefit from explicit definition of how the per-event-bit metric is derived from the measured latency and whether it accounts for arbitration overhead.
  2. [Abstract] Abstract: Inclusion of error bars, standard deviations, or at least a statement of measurement repeatability would improve the presentation of the silicon results.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for the thorough review and valuable feedback. We address each major comment below, agreeing to strengthen the manuscript with additional analysis and details where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The positioning of the design as suitable for 'scalable neuromorphic computing systems' is load-bearing but unsupported beyond the 8-event prototype. No simulation results, timing analysis, or validation of the semi-decoupled micropipeline and embedded arbiter are provided for deeper trees or higher event densities, where accumulated forward-path delay mismatch could violate the bundled-data assumption or introduce unexercised metastability.

    Authors: We thank the referee for this observation. The design employs a modular tree structure with replicated semi-decoupled micropipelines and embedded arbiters, which by construction supports scaling to larger event counts without custom cells. The 8-event prototype confirms the functionality and performance in silicon. To directly address the concern, the revised manuscript will include post-synthesis timing analysis and behavioral simulations for deeper trees (such as 16- and 32-event configurations), verifying that delay mismatches do not violate the bundled-data protocol and that metastability is properly resolved at each level. revision: yes

  2. Referee: [Prototype validation] Prototype results (as reported in the abstract and validation section): The measured peak throughput of 33 MEvent/s and energy of 435 fJ/event lack any baseline comparison to prior AER encoders or details on test conditions (e.g., event injection rate, temperature, or number of trials), which weakens assessment of the claimed performance margins.

    Authors: We agree that providing baseline comparisons and test setup details will facilitate better evaluation. The measurements were obtained from the fabricated prototype under controlled conditions. In the revised version, we will augment the results section with a comparison to representative prior AER encoders (including throughput, energy per event, and technology node) and specify the test conditions: event injection at varying rates up to 33 MEvent/s using an on-chip pattern generator, measurements at 25°C, and statistics over 10,000 events across multiple runs. revision: yes

Circularity Check

0 steps flagged

No circularity: design validated by silicon measurements, no derivations or predictions

full rationale

The paper describes a synthesizable asynchronous AER encoder architecture using bundled-data semi-decoupled micropipelines and standard-cell flip-flops, followed by fabrication of an 8-event prototype in 65 nm CMOS and direct post-silicon measurements of throughput, latency, and energy. No equations, first-principles derivations, fitted parameters, or predictions appear in the provided text; performance claims reduce directly to measured silicon results rather than to any self-referential construction, self-citation chain, or ansatz. The scaling assumption for larger trees is stated as a design goal but is not presented as a derived result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard asynchronous circuit design principles and the properties of the 65 nm CMOS process; no free parameters, invented entities, or ad-hoc axioms are introduced beyond domain assumptions.

axioms (1)
  • domain assumption Bundled-data asynchronous protocols and semi-decoupled micropipelines function correctly when implemented with edge-triggered flip-flops
    The design replaces transparent latches with flip-flops while preserving asynchronous timing assumptions from prior literature.

pith-pipeline@v0.9.0 · 5469 in / 1271 out tokens · 66270 ms · 2026-05-10T20:06:55.872051+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    Neuromorphic computing at scale,

    D. Kudithipudi, C. Schuman, C. M. Vineyard, T. Pandit, C. Merkel, R. Kubendran, J. B. Aimone, G. Orchard, C. Mayr, R. Benosman, J. Hays, C. Young, C. Bartolozzi, A. Majumdar, S. G. Cardwell, M. Payvand, S. Buckley, S. Kulkarni, H. A. Gonzalez, G. Cauwenberghs, C. S. Thakur, A. Subramoney, and S. Furber, “Neuromorphic computing at scale,”Nature, vol. 637, ...

  2. [2]

    Research on SNN Learning Algorithms and Networks Based on Biological Plausibility,

    B. Huo, F. Li, S. Peng, H. Chen, S. Xin, and H. Wang, “Research on SNN Learning Algorithms and Networks Based on Biological Plausibility,”IEEE Access, vol. 13, pp. 95 243–95 256, 2025

  3. [3]

    A 128×128 120 dB 15µs Latency Asynchronous Temporal Contrast Vision Sensor,

    P. Lichtsteiner, C. Posch, and T. Delbruck, “A 128×128 120 dB 15µs Latency Asynchronous Temporal Contrast Vision Sensor,”IEEE Journal of Solid-State Circuits, vol. 43, no. 2, pp. 566–576, Feb. 2008

  4. [4]

    Hierarchical Address Event Routing for Reconfigurable Large-Scale Neuromorphic Systems,

    J. Park, T. Yu, S. Joshi, C. Maier, and G. Cauwenberghs, “Hierarchical Address Event Routing for Reconfigurable Large-Scale Neuromorphic Systems,”IEEE Transactions on Neural Networks and Learning Sys- tems, vol. 28, no. 10, pp. 2408–2422, Oct. 2017

  5. [5]

    Hierarchical Token Rings for Address- Event Encoding,

    P. Purohit and R. Manohar, “Hierarchical Token Rings for Address- Event Encoding,” in2021 27th IEEE International Symposium on Asynchronous Circuits and Systems (ASYNC), Sep. 2021, pp. 9–16, iSSN: 2643-1483

  6. [6]

    Multicasting Mesh AER: A Scalable Assembly Approach for Reconfigurable Neuromorphic Structured AER Systems. Application to ConvNets,

    C. Zamarreno-Ramos, A. Linares-Barranco, T. Serrano-Gotarredona, and B. Linares-Barranco, “Multicasting Mesh AER: A Scalable Assembly Approach for Reconfigurable Neuromorphic Structured AER Systems. Application to ConvNets,”IEEE Transactions on Biomedical Circuits and Systems, vol. 7, no. 1, pp. 82–102, Feb. 2013

  7. [7]

    Micropipelines,

    I. E. Sutherland, “Micropipelines,”Commun. ACM, vol. 32, no. 6, pp. 720–738, Jun. 1989

  8. [8]

    A Scalable Area-Efficient Low-Delay Asynchronous AER Cir- cuits Design for Neuromorphic Chips,

    S. Ouyang, K. Zhou, H. Jiang, C. Li, J. Liang, F. Zhu, X. Zhang, and Q. Liu, “A Scalable Area-Efficient Low-Delay Asynchronous AER Cir- cuits Design for Neuromorphic Chips,”IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 71, no. 5, pp. 2804–2808, May 2024

  9. [9]

    A bi-directional Address-Event transceiver block for low-latency inter-chip communication in neuromorphic sys- tems,

    N. Qiao and G. Indiveri, “A bi-directional Address-Event transceiver block for low-latency inter-chip communication in neuromorphic sys- tems,” in2018 IEEE International Symposium on Circuits and Systems (ISCAS), May 2018, pp. 1–5, iSSN: 2379-447X

  10. [10]

    Field-programmable encoding for address- event representation,

    P. Purohit and R. Manohar, “Field-programmable encoding for address- event representation,”Frontiers in Neuroscience, vol. 16, Dec. 2022

  11. [11]

    PAICORE: A 1.9-Million-Neuron 5.181-TSOPS/W Digital Neuromorphic Processor With Unified SNN-ANN and On-Chip Learn- ing Paradigm,

    Y . Zhong, Y . Kuang, K. Liu, Z. Wang, S. Feng, G. Chen, Y . Yang, X. Cui, Q. Wang, J. Cao, S. Jia, Y . Liang, G. Sun, X. Cui, R. Huang, and Y . Wang, “PAICORE: A 1.9-Million-Neuron 5.181-TSOPS/W Digital Neuromorphic Processor With Unified SNN-ANN and On-Chip Learn- ing Paradigm,”IEEE Journal of Solid-State Circuits, vol. 60, no. 2, pp. 651–671, Feb. 2025

  12. [12]

    MorphIC: A 65-nm 738k- Synapse/mm2 Quad-Core Binary-Weight Digital Neuromorphic Proces- sor With Stochastic Spike-Driven Online Learning,

    C. Frenkel, J.-D. Legat, and D. Bol, “MorphIC: A 65-nm 738k- Synapse/mm2 Quad-Core Binary-Weight Digital Neuromorphic Proces- sor With Stochastic Spike-Driven Online Learning,”IEEE Transactions on Biomedical Circuits and Systems, vol. 13, no. 5, pp. 999–1010, Oct. 2019

  13. [13]

    A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools,

    H. Wu, Z. Su, J. Zhang, S. Wei, Z. Wang, and H. Chen, “A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 11, pp. 2421–2425, Nov. 2021

  14. [14]

    Four-phase micropipeline latch control circuits,

    S. Furber and P. Day, “Four-phase micropipeline latch control circuits,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 4, no. 2, pp. 247–253, Jun. 1996

  15. [15]

    Design of latch-based C-element,

    J. P. Murphy, “Design of latch-based C-element,”Electronics Letters, vol. 48, no. 19, pp. 1–2, Sep. 2012, num Pages: 2

  16. [16]

    An Asynchronous AER Circuits with Rotation Priority Tree Arbiter for Neuromorphic Hardware with Analog Neuron,

    J. Wei, J. Zhang, X. Zhang, Z. Wu, C. Dou, T. Shi, H. Chen, and Q. Liu, “An Asynchronous AER Circuits with Rotation Priority Tree Arbiter for Neuromorphic Hardware with Analog Neuron,” in2019 IEEE 13th International Conference on ASIC (ASICON), Oct. 2019, pp. 1–4, iSSN: 2162-755X

  17. [17]

    Asynchronous Techniques for System-on- Chip Design,

    A. Martin and M. Nystrom, “Asynchronous Techniques for System-on- Chip Design,”Proceedings of the IEEE, vol. 94, no. 6, pp. 1089–1120, Jun. 2006

  18. [18]

    Stochasticity Versus Determinacy in Neurobiology: From Ion Channels to the Question of the “Free Will

    H. A. Braun, “Stochasticity Versus Determinacy in Neurobiology: From Ion Channels to the Question of the “Free Will”,”Frontiers in Systems Neuroscience, vol. 15, p. 629436, May 2021

  19. [19]

    A Bundled- Data Asynchronous Circuit Synthesis Flow Using a Commercial EDA Framework,

    M. Gibiluka, M. T. Moreira, and N. L. Vilar Calazans, “A Bundled- Data Asynchronous Circuit Synthesis Flow Using a Commercial EDA Framework,” in2015 Euromicro Conference on Digital System Design, Aug. 2015, pp. 79–86

  20. [20]

    A biomimetic neural encoder for spiking neural network,

    S. Subbulakshmi Radhakrishnan, A. Sebastian, A. Oberoi, S. Das, and S. Das, “A biomimetic neural encoder for spiking neural network,” Nature Communications, vol. 12, no. 1, p. 2143, Apr. 2021