A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology
Pith reviewed 2026-05-24 18:12 UTC · model grok-4.3
The pith
The first stochastic-computing DNN acceleration framework is built on AQFP superconducting technology.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This work is the first to develop an SC-based DNN acceleration framework using AQFP technology. It leverages AQFP's deep pipelining nature since each logic gate connects to an AC clock signal and the unique opportunity of true random number generation using a single AQFP buffer, which together make AQFP especially compatible with stochastic computing's time-independent bit-sequence representation for approximate DNN computations.
What carries the argument
Stochastic computing, which represents values as time-independent bit sequences and tolerates approximate operations, paired with AQFP's AC-clocked deep pipelining and single-buffer random number generation.
If this is right
- DNN inference achieves ultra-high energy efficiency using AQFP hardware compared with state-of-the-art CMOS.
- Deep pipelining in AQFP circuits avoids read-after-write hazards when stochastic bit streams are used.
- Large-scale systems with tens of thousands of Josephson junctions become practical for DNN accelerators.
- The approach supports DNN acceleration in high-performance computing and deep-space applications.
Where Pith is reading between the lines
- The same AQFP-SC pairing might apply to other approximate-computing workloads beyond neural networks.
- Energy gains could enable complex inference on power-constrained platforms such as satellites.
- Direct hardware prototypes would be needed to confirm simulation-based efficiency claims.
Load-bearing premise
AQFP's deep pipelining from AC clock signals and its single-buffer true random number generation make the technology especially compatible with stochastic computing for DNN inference.
What would settle it
Fabrication and energy measurement of a physical AQFP circuit running the proposed SC DNN framework that shows no substantial efficiency gain over CMOS while keeping acceptable accuracy would disprove the central advantage.
Figures
read the original abstract
The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript claims to present the first stochastic-computing (SC) based DNN acceleration framework using Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology. It identifies AQFP's deep pipelining (due to AC clocking) and single-buffer true RNG as making the technology especially compatible with SC's time-independent bit-stream representation, and notes SC's prior suitability for approximate DNN computations; the work positions this combination as promising for ultra-low-energy inference in high-performance and space applications.
Significance. If a concrete framework with implementation details, hazard-resolution mechanisms, and energy/performance evaluations were provided and validated, the result would be significant as a novel bridge between superconducting logic families and stochastic computing for DNNs, potentially enabling orders-of-magnitude efficiency gains over CMOS. The manuscript as presented, however, contains no such elaboration, derivations, or results.
major comments (2)
- [Abstract] Abstract: the central claim that AQFP's deep pipelining and single-buffer RNG make it 'especially compatible' with SC is asserted without any supporting analysis, timing diagram, hazard-resolution scheme, or comparison to CMOS RNG costs; this compatibility observation is load-bearing for the novelty argument but is not derived or demonstrated.
- [Abstract] Abstract: the manuscript states that 'this work is the first to develop an SC-based DNN acceleration framework using AQFP technology' yet provides neither a high-level architecture, nor any SC-to-AQFP mapping, nor a comparison against prior SC-DNN or AQFP works to substantiate the 'first' claim.
minor comments (1)
- [Abstract] The abstract mentions '83,000 JJs' fabrication results from 2016 but does not cite the corresponding reference or explain its relevance to the proposed DNN framework.
Simulated Author's Rebuttal
We thank the referee for the detailed review and constructive feedback. We address each major comment below. The abstract is intentionally concise, but we agree it can be strengthened to better preview the supporting material in the full manuscript. We will revise the abstract and add explicit references to the relevant sections, diagrams, and comparisons.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that AQFP's deep pipelining and single-buffer RNG make it 'especially compatible' with SC is asserted without any supporting analysis, timing diagram, hazard-resolution scheme, or comparison to CMOS RNG costs; this compatibility observation is load-bearing for the novelty argument but is not derived or demonstrated.
Authors: We agree the abstract asserts the compatibility without derivation. The full manuscript derives this in Section II (AQFP characteristics) and Section III (SC compatibility): SC bit-streams are time-independent, which aligns with AQFP's AC-clocked deep pipelining and eliminates RAW hazards without additional buffering; a hazard-resolution scheme is described using the bit-stream representation itself. Figure 2 provides the timing diagram. Section II also compares the single-buffer true RNG in AQFP to CMOS RNG, which requires multiple gates or external entropy sources. We will revise the abstract to briefly reference these analyses and the sections/figure. revision: yes
-
Referee: [Abstract] Abstract: the manuscript states that 'this work is the first to develop an SC-based DNN acceleration framework using AQFP technology' yet provides neither a high-level architecture, nor any SC-to-AQFP mapping, nor a comparison against prior SC-DNN or AQFP works to substantiate the 'first' claim.
Authors: The full manuscript substantiates the claim with a high-level architecture in Section III (including Figure 1), the SC-to-AQFP mapping and DNN accelerator design in Section IV, and a Related Work section that reviews prior SC-DNN accelerators (all CMOS-based) and prior AQFP circuits (none using SC). The 'first' claim follows from the absence of any prior work combining the two. We will revise the abstract to explicitly reference these sections and the comparison. revision: yes
Circularity Check
No significant circularity detected
full rationale
The manuscript presents an engineering framework whose central claim is novelty in combining SC with AQFP, justified by direct observational statements about compatibility (deep pipelining and single-buffer RNG) rather than any derivation chain, equations, or fitted quantities. No self-citations, ansatzes, or reductions of predictions to inputs appear in the provided text. The argument is propositional and does not reduce to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption AQFP exhibits deep pipelining and efficient true RNG via single buffer
Reference graph
Works this paper leans on
-
[1]
Armin Alaghi and John P Hayes. 2013. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS) 12, 2s (2013), 92
work page 2013
-
[2]
Armin Alaghi, Weikang Qian, and John P Hayes. 2018. The promise and challenge of stochastic computing. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 37, 8 (2018), 1515–1531
work page 2018
-
[3]
Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, and Warren J Gross. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2688–2699
work page 2017
-
[4]
Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim, Qing Dong, Yen-Po Chen, Laura Fick, Xun Sun, Ron Dreslinski, Trevor Mudge, Hun Seok Kim, David Blaauw, and Dennis Sylvester. 2017. 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circu...
work page 2017
-
[5]
Bradley D Brown and Howard C Card. 2001. Stochastic neural compu- tation. I. Computational elements. IEEE Transactions on computers 50, 9 (2001), 891–905
work page 2001
-
[6]
Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, and Yanzhi Wang. 2018. VIBNN: Hardware Acceleration of Bayesian Neural Networks. SIGPLAN Not. 53, 2 (March 2018), 476–488. https://doi.org/10.1145/3296957.3173212
-
[7]
Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269–284
work page 2014
-
[8]
Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam
-
[9]
In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture
Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture. IEEE Computer Society, 609–622
-
[10]
Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep con- volutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138
work page 2017
-
[11]
John Clarke and Alex I Braginski. 2006. The SQUID handbook: Applica- tions of SQUIDs and SQUID systems . John Wiley & Sons
work page 2006
-
[12]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[13]
Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1 A 2.9 TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Solid-State Circuits Conference (ISSCC), 2017 IEEE Intern...
work page 2017
-
[14]
Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104
work page 2015
-
[15]
Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference . ACM, 149–156
work page 1967
-
[16]
Brian R Gaines. 1969. Stochastic computing systems. In Advances in information systems science. Springer, 37–172
work page 1969
-
[17]
James E Gentle. 2006. Random number generation and Monte Carlo methods. Springer Science & Business Media
work page 2006
-
[18]
Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.. In FPGA. 75–84
work page 2017
-
[19]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd Inter- national Symposium on Computer Architecture . IEEE Press, 243–254
work page 2016
-
[20]
Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12
work page 2016
-
[21]
Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Re- configurable Interconnects. In Proceedings of the Twenty-Third Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 461–475
work page 2018
-
[22]
Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. (2010). http://yann. lecun.com/exdb/mnist/
work page 2010
-
[23]
Vincent T Lee, Armin Alaghi, John P Hayes, Visvesh Sathe, and Luis Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In Proceedings of the Conference on De- sign, Automation & Test in Europe . European Design and Automation Association, 13–18
work page 2017
-
[24]
K. Likharev. 1977. Dynamics of some single flux quantum devices: I. Parametric quantron. IEEE Transactions on Magnetics 13, 1 (January 1977), 242–244. https://doi.org/10.1109/TMAG.1977.1059351
-
[25]
K. K. Likharev and V. K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (March 1991), 3–28. https://doi.org/10.1109/77.80745
-
[26]
Kathy J Liszka and Kenneth E Batcher. 1993. A generalized bitonic sorting network. In Parallel Processing, 1993. ICPP 1993. International Conference on, Vol. 1. IEEE, 105–108
work page 1993
-
[27]
K. Loe and E. Goto. 1985. Analysis of flux input and output Josephson pair device. IEEE Transactions on Magnetics 21, 2 (March 1985), 884–887. https://doi.org/10.1109/TMAG.1985.1063734
-
[28]
Pierre LâĂŹEcuyer. 2012. Random number generation. In Handbook of Computational Statistics. Springer, 35–71
work page 2012
-
[29]
Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A 11 unified template-based framework for accelerating statistical machine learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on . IEEE, 14–26
work page 2016
-
[30]
Bert Moons, Roel Uytterhoeven, Wim Dehaene, and Marian Verhelst
-
[31]
In Solid-State Circuits Conference (ISSCC), 2017 IEEE International
14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic- voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International. IEEE, 246–247
work page 2017
-
[32]
Shuichi Nagasawa, Yoshihito Hashimoto, Hideaki Numata, and Shuichi Tahara. 1995. A 380 ps, 9.5 mW Josephson 4-Kbit RAM operated at a high bit yield. IEEE Transactions on Applied Superconductivity 5, 2 (1995), 2447–2452
work page 1995
- [33]
-
[34]
Harald Niederreiter. 1992. Random number generation and quasi-Monte Carlo methods. Vol. 63. Siam
work page 1992
-
[35]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . ACM, 26–35
work page 2016
-
[36]
Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu- Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267–278
work page 2016
-
[37]
Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. ACM SIGOPS Operating Systems Review 51, 2 (2017), 405–418
work page 2017
-
[38]
Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, and Bo Yuan. 2016. Designing reconfigurable large-scale deep learning systems using stochastic com- puting. In Rebooting Computing (ICRC), IEEE International Conference on. IEEE, 1–7
work page 2016
-
[39]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh
-
[40]
In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on
From high-level deep neural models to FPGAs. In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12
work page 2016
-
[41]
Hyeonuk Sim and Jongeun Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In Proceedings of the 54th Annual Design Automation Conference 2017 . ACM, 29
work page 2017
-
[42]
Jaehyeong Sim, Jun-Seok Park, Minhye Kim, Dongmyung Bae, Yeong- jae Choi, and Lee-Sup Kim. 2016. 14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems. In Solid-State Circuits Conference (ISSCC), 2016 IEEE International . IEEE, 264–265
work page 2016
-
[43]
Mingcong Song, Kan Zhong, Jiaqi Zhang, Yang Hu, Duo Liu, Weigong Zhang, Jing Wang, and Tao Li. 2018. In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems. In High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on . IEEE, 92–103
work page 2018
-
[44]
Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput- optimized OpenCL-based FPGA accelerator for large-scale convolu- tional neural networks. In Proceedings of the 2016 ACM/SIGDA Interna- tional Symposium on Field-Programmable Gate Arrays . ACM, 16–25
work page 2016
-
[45]
Naoki Takeuchi, Shuichi Nagasawa, Fumihiro China, Takumi Ando, Mutsuo Hidaka, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2017. Adia- batic quantum-flux-parametron cell library designed using a 10 kA cm- 2 niobium fabrication process. Superconductor Science and Technology 30, 3 (2017), 035002
work page 2017
-
[46]
Naoki Takeuchi, Dan Ozawa, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. An adiabatic quantum flux parametron as an ultra- low-power logic device. Superconductor Science and Technology 26, 3 (2013), 035010
work page 2013
-
[47]
Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. Measurement of 10 zJ energy dissipation of adiabatic quantum-flux- parametron logic using a superconducting resonator. Applied Physics Letters 102, 5 (2013), 052602
work page 2013
-
[48]
Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2014. Energy efficiency of adiabatic superconductor logic. Superconductor Science and Technology 28, 1 (nov 2014), 015003. https://doi.org/10. 1088/0953-2048/28/1/015003
work page 2014
-
[49]
Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2015. Adiabatic quantum-flux-parametron cell library adopting minimalist design. Journal of Applied Physics 117, 17 (2015), 173912
work page 2015
-
[50]
Sergey K Tolpygo, Vladimir Bolkhovsky, Terence J Weir, Alex Wynn, Daniel E Oates, Leonard M Johnson, and Mark A Gouker. 2016. Ad- vanced fabrication processes for superconducting very large-scale integrated circuits. IEEE Transactions on Applied Superconductivity 26, 3 (2016), 1–10
work page 2016
-
[51]
Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays. ACM, 65–74
work page 2017
-
[52]
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evalu- ating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Arc...
work page 2017
-
[53]
Yanzhi Wang, Zheng Zhan, Jiayu Li, Jian Tang, Bo Yuan, Liang Zhao, Wujie Wen, Siyue Wang, and Xue Lin. 2018. On the Universal Approximation Property and Equivalence of Stochastic Computing- based Neural Networks and Binary Neural Networks. arXiv preprint arXiv:1803.05391 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[54]
Paul N Whatmough, Sae Kyu Lee, Hyunkwang Lee, Saketh Rama, David Brooks, and Gu-Yeon Wei. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for IoT applications. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International . IEEE, 242–243
work page 2017
-
[55]
Dongbin Xiu. 2010. Numerical methods for stochastic computations: a spectral method approach. Princeton university press
work page 2010
-
[56]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and accel- eration for deep convolutional neural networks. In Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on . IEEE, 1–8
work page 2016
-
[57]
Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Ja- son Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Sym- posium on Low Power Electronics and Design . ACM, 326–331
work page 2016
-
[58]
Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani B Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Ac- celerating Binarized Convolutional Neural Networks with Software- Programmable FPGAs.. In FPGA. 15–24. 12
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.