A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

Ao Ren; Caiwen Ding; Jie Han; Ning Liu; Nobuyuki Yoshikawa; Olivia Chen; Ruizhe Cai; Wenhui Luo; Xuehai Qian; Yanzhi Wang

REVIEW 2 major objections 1 minor 58 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

The first stochastic-computing DNN acceleration framework is built on AQFP superconducting technology.

2026-05-24 18:12 UTC pith:FCYTBQWF

load-bearing objection The paper sketches a conceptual SC-DNN framework on AQFP that rests on a plausible hardware match but offers little beyond the high-level argument. the 2 major comments →

arxiv 1907.09077 v1 pith:FCYTBQWF submitted 2019-07-22 cs.NE cs.ETcs.LGeess.SP

A Stochastic-Computing based Deep Learning Framework using Adiabatic Quantum-Flux-Parametron SuperconductingTechnology

Ruizhe Cai , Ao Ren , Olivia Chen , Ning Liu , Caiwen Ding , Xuehai Qian , Jie Han , Wenhui Luo

show 2 more authors

Nobuyuki Yoshikawa Yanzhi Wang

This is my paper

classification cs.NE cs.ETcs.LGeess.SP

keywords stochastic computingAQFPdeep neural networkssuperconducting logicenergy efficiencyhardware accelerationrandom number generation

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an acceleration framework for deep neural networks that pairs stochastic computing with Adiabatic Quantum-Flux-Parametron superconducting circuits. It shows that AQFP's deep pipelining from AC clocking and its single-buffer true random number generation align directly with stochastic computing's use of bit sequences for approximate calculations. A sympathetic reader would care because the combination targets ultra-high energy efficiency for DNN inference, far beyond what CMOS can deliver, with direct relevance to large-scale computing and deep-space uses. The work presents itself as the initial development of such an integrated system.

Core claim

This work is the first to develop an SC-based DNN acceleration framework using AQFP technology. It leverages AQFP's deep pipelining nature since each logic gate connects to an AC clock signal and the unique opportunity of true random number generation using a single AQFP buffer, which together make AQFP especially compatible with stochastic computing's time-independent bit-sequence representation for approximate DNN computations.

What carries the argument

Stochastic computing, which represents values as time-independent bit sequences and tolerates approximate operations, paired with AQFP's AC-clocked deep pipelining and single-buffer random number generation.

Load-bearing premise

AQFP's deep pipelining from AC clock signals and its single-buffer true random number generation make the technology especially compatible with stochastic computing for DNN inference.

What would settle it

Fabrication and energy measurement of a physical AQFP circuit running the proposed SC DNN framework that shows no substantial efficiency gain over CMOS while keeping acceptable accuracy would disprove the central advantage.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

DNN inference achieves ultra-high energy efficiency using AQFP hardware compared with state-of-the-art CMOS.
Deep pipelining in AQFP circuits avoids read-after-write hazards when stochastic bit streams are used.
Large-scale systems with tens of thousands of Josephson junctions become practical for DNN accelerators.
The approach supports DNN acceleration in high-performance computing and deep-space applications.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same AQFP-SC pairing might apply to other approximate-computing workloads beyond neural networks.
Energy gains could enable complex inference on power-constrained platforms such as satellites.
Direct hardware prototypes would be needed to confirm simulation-based efficiency claims.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

The paper sketches a conceptual SC-DNN framework on AQFP that rests on a plausible hardware match but offers little beyond the high-level argument.

read the letter

The main point is that this is the first paper to outline a stochastic-computing DNN framework built around AQFP superconducting logic. The authors note that AQFP's deep pipelining and single-buffer true RNG line up with SC's time-independent bit streams, and they connect this to SC's tolerance for approximate arithmetic in neural nets. They ground the discussion in the 2016 large-scale AQFP fabrication result and reference earlier SC-DNN work without obvious gaps in the citations shown. That compatibility argument is the clearest new piece, and it is presented directly rather than through circular claims or fitted parameters. The paper stays within its scope and does not overstate what has been built. The main limitation is that the framework stays at the level of stated compatibility and high-level mapping. There are no gate-level designs, no error-rate analysis specific to AQFP, no area or energy projections from simulation, and no handling of the RAW hazards that the deep pipelining would actually create in a real accelerator. The efficiency advantage is asserted from AQFP's general properties rather than demonstrated for this combination. Readers already working on superconducting logic or hardware for approximate computing will see the most value; anyone expecting circuit details or benchmark numbers will come away empty. The work shows clear thinking on the hardware-software fit and honest engagement with the cited literature, so it is worth sending out for peer review even though it will need substantial expansion to become a complete contribution.

Referee Report

2 major / 1 minor

Summary. The manuscript claims to present the first stochastic-computing (SC) based DNN acceleration framework using Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology. It identifies AQFP's deep pipelining (due to AC clocking) and single-buffer true RNG as making the technology especially compatible with SC's time-independent bit-stream representation, and notes SC's prior suitability for approximate DNN computations; the work positions this combination as promising for ultra-low-energy inference in high-performance and space applications.

Significance. If a concrete framework with implementation details, hazard-resolution mechanisms, and energy/performance evaluations were provided and validated, the result would be significant as a novel bridge between superconducting logic families and stochastic computing for DNNs, potentially enabling orders-of-magnitude efficiency gains over CMOS. The manuscript as presented, however, contains no such elaboration, derivations, or results.

major comments (2)

[Abstract] Abstract: the central claim that AQFP's deep pipelining and single-buffer RNG make it 'especially compatible' with SC is asserted without any supporting analysis, timing diagram, hazard-resolution scheme, or comparison to CMOS RNG costs; this compatibility observation is load-bearing for the novelty argument but is not derived or demonstrated.
[Abstract] Abstract: the manuscript states that 'this work is the first to develop an SC-based DNN acceleration framework using AQFP technology' yet provides neither a high-level architecture, nor any SC-to-AQFP mapping, nor a comparison against prior SC-DNN or AQFP works to substantiate the 'first' claim.

minor comments (1)

[Abstract] The abstract mentions '83,000 JJs' fabrication results from 2016 but does not cite the corresponding reference or explain its relevance to the proposed DNN framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address each major comment below. The abstract is intentionally concise, but we agree it can be strengthened to better preview the supporting material in the full manuscript. We will revise the abstract and add explicit references to the relevant sections, diagrams, and comparisons.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that AQFP's deep pipelining and single-buffer RNG make it 'especially compatible' with SC is asserted without any supporting analysis, timing diagram, hazard-resolution scheme, or comparison to CMOS RNG costs; this compatibility observation is load-bearing for the novelty argument but is not derived or demonstrated.

Authors: We agree the abstract asserts the compatibility without derivation. The full manuscript derives this in Section II (AQFP characteristics) and Section III (SC compatibility): SC bit-streams are time-independent, which aligns with AQFP's AC-clocked deep pipelining and eliminates RAW hazards without additional buffering; a hazard-resolution scheme is described using the bit-stream representation itself. Figure 2 provides the timing diagram. Section II also compares the single-buffer true RNG in AQFP to CMOS RNG, which requires multiple gates or external entropy sources. We will revise the abstract to briefly reference these analyses and the sections/figure. revision: yes
Referee: [Abstract] Abstract: the manuscript states that 'this work is the first to develop an SC-based DNN acceleration framework using AQFP technology' yet provides neither a high-level architecture, nor any SC-to-AQFP mapping, nor a comparison against prior SC-DNN or AQFP works to substantiate the 'first' claim.

Authors: The full manuscript substantiates the claim with a high-level architecture in Section III (including Figure 1), the SC-to-AQFP mapping and DNN accelerator design in Section IV, and a Related Work section that reviews prior SC-DNN accelerators (all CMOS-based) and prior AQFP circuits (none using SC). The 'first' claim follows from the absence of any prior work combining the two. We will revise the abstract to explicitly reference these sections and the comparison. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript presents an engineering framework whose central claim is novelty in combining SC with AQFP, justified by direct observational statements about compatibility (deep pipelining and single-buffer RNG) rather than any derivation chain, equations, or fitted quantities. No self-citations, ansatzes, or reductions of predictions to inputs appear in the provided text. The argument is propositional and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only; no free parameters, invented entities, or additional axioms beyond the stated compatibility of AQFP traits with SC are extractable.

axioms (1)

domain assumption AQFP exhibits deep pipelining and efficient true RNG via single buffer
Invoked in abstract to justify SC compatibility

pith-pipeline@v0.9.0 · 5839 in / 998 out tokens · 15975 ms · 2026-05-24T18:12:20.490888+00:00 · methodology

0 comments

read the original abstract

The Adiabatic Quantum-Flux-Parametron (AQFP) superconducting technology has been recently developed, which achieves the highest energy efficiency among superconducting logic families, potentially huge gain compared with state-of-the-art CMOS. In 2016, the successful fabrication and testing of AQFP-based circuits with the scale of 83,000 JJs have demonstrated the scalability and potential of implementing large-scale systems using AQFP. As a result, it will be promising for AQFP in high-performance computing and deep space applications, with Deep Neural Network (DNN) inference acceleration as an important example. Besides ultra-high energy efficiency, AQFP exhibits two unique characteristics: the deep pipelining nature since each AQFP logic gate is connected with an AC clock signal, which increases the difficulty to avoid RAW hazards; the second is the unique opportunity of true random number generation (RNG) using a single AQFP buffer, far more efficient than RNG in CMOS. We point out that these two characteristics make AQFP especially compatible with the \emph{stochastic computing} (SC) technique, which uses a time-independent bit sequence for value representation, and is compatible with the deep pipelining nature. Further, the application of SC has been investigated in DNNs in prior work, and the suitability has been illustrated as SC is more compatible with approximate computations. This work is the first to develop an SC-based DNN acceleration framework using AQFP technology.

Figures

Figures reproduced from arXiv: 1907.09077 by Ao Ren, Caiwen Ding, Jie Han, Ning Liu, Nobuyuki Yoshikawa, Olivia Chen, Ruizhe Cai, Wenhui Luo, Xuehai Qian, Yanzhi Wang.

**Figure 2.** Figure 2: Example of AQFP logic gates. Data output (a) (b) Clock_in_phase 1 Clock_in_phase 2 Clock_in_phase 3 Clock_in_phase 4 Data input AQFP logic block 1 clock cycle Clock_in_phase 1 Data input Phase 1 (clock in) Phase 1 (data out) Phase 2 (clock in) Phase 2 (data out) Phase 3 (clock in) Phase 3 (data out) Phase 4 (clock in) Phase 4 (data out) Clock_in_phase 2 Clock_in_phase 3 Clock_in_phase 4 [PITH_FULL_IMAGE:f… view at source ↗

**Figure 3.** Figure 3: (a). Four phase clocking scheme for AQFP [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 5.** Figure 5: Feature extraction block of CMOS-based SC [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Proposed SC-based DNN architecture using [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 7.** Figure 7: (a). 1-bit true RNG in AQFP; (b). Output dis [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: True RNG cluster consisting of N × N unit true RNGs, where each unit is shared by four N-bit random numbers. each two output random numbers only share a single bit in common. 4.2 Integration of Summation and Activation Function in CONV Layers To overcome the difficulty in accumulator implementation in AQFP, we re-formulate the operation of SC-based feature extraction block in a different aspect. The stocha… view at source ↗

**Figure 9.** Figure 9: Layout of 1-bit true RNG using AQFP. where N is the length of the stochastic stream and M is the number of inputs. The clip operation restricts the value between the given bounds. This formulation accounts for inner product (summation) and activation function. Consequently, Õ N i=1 SOi = clip Õ N i=1 Õ M j=1 SPi,j − M − 1 2 × N, 0, N (2) That is, the total number of 1’s in the output stochastic stream,… view at source ↗

**Figure 10.** Figure 10: Example of 8-input binary bitonic sorter. [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 11.** Figure 11: (a) Bitonic sorter for even-numbered in [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗

**Figure 13.** Figure 13: Activated output of the proposed feature ex [PITH_FULL_IMAGE:figures/full_fig_p007_13.png] view at source ↗

**Figure 15.** Figure 15: Categorization block implementation using [PITH_FULL_IMAGE:figures/full_fig_p008_15.png] view at source ↗

**Figure 14.** Figure 14: Proposed bitonic sorter based sub-sampling [PITH_FULL_IMAGE:figures/full_fig_p008_14.png] view at source ↗

**Figure 16.** Figure 16: AQFP chip testing. 0’s; the output is 0 otherwise. The relative value/importance of each output can be reflected in this way, thereby fulfilling the requirement of categorization block in FC layers.The proposed categorization logic can be realized using a simple majority chain structure. Thanks to the nature of AQFP technology, a three-input majority gate costs the same hardware resource as a two-input… view at source ↗

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 58 canonical work pages · 2 internal anchors

[1]

Armin Alaghi and John P Hayes. 2013. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS) 12, 2s (2013), 92

work page 2013
[2]

Armin Alaghi, Weikang Qian, and John P Hayes. 2018. The promise and challenge of stochastic computing. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 37, 8 (2018), 1515–1531

work page 2018
[3]

Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, and Warren J Gross. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2688–2699

work page 2017
[4]

Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim, Qing Dong, Yen-Po Chen, Laura Fick, Xun Sun, Ron Dreslinski, Trevor Mudge, Hun Seok Kim, David Blaauw, and Dennis Sylvester. 2017. 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circu...

work page 2017
[5]

Bradley D Brown and Howard C Card. 2001. Stochastic neural compu- tation. I. Computational elements. IEEE Transactions on computers 50, 9 (2001), 891–905

work page 2001
[6]

Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, and Yanzhi Wang. 2018. VIBNN: Hardware Acceleration of Bayesian Neural Networks. SIGPLAN Not. 53, 2 (March 2018), 476–488. https://doi.org/10.1145/3296957.3173212

work page doi:10.1145/3296957.3173212 2018
[7]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269–284

work page 2014
[8]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam

work page
[9]

In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture

Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture. IEEE Computer Society, 609–622

work page
[10]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep con- volutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138

work page 2017
[11]

John Clarke and Alex I Braginski. 2006. The SQUID handbook: Applica- tions of SQUIDs and SQUID systems . John Wiley & Sons

work page 2006
[12]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[13]

Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1 A 2.9 TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Solid-State Circuits Conference (ISSCC), 2017 IEEE Intern...

work page 2017
[14]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104

work page 2015
[15]

Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference . ACM, 149–156

work page 1967
[16]

Brian R Gaines. 1969. Stochastic computing systems. In Advances in information systems science. Springer, 37–172

work page 1969
[17]

James E Gentle. 2006. Random number generation and Monte Carlo methods. Springer Science & Business Media

work page 2006
[18]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.. In FPGA. 75–84

work page 2017
[19]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd Inter- national Symposium on Computer Architecture . IEEE Press, 243–254

work page 2016
[20]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12

work page 2016
[21]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Re- configurable Interconnects. In Proceedings of the Twenty-Third Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 461–475

work page 2018
[22]

Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. (2010). http://yann. lecun.com/exdb/mnist/

work page 2010
[23]

Vincent T Lee, Armin Alaghi, John P Hayes, Visvesh Sathe, and Luis Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In Proceedings of the Conference on De- sign, Automation & Test in Europe . European Design and Automation Association, 13–18

work page 2017
[24]

Likharev

K. Likharev. 1977. Dynamics of some single flux quantum devices: I. Parametric quantron. IEEE Transactions on Magnetics 13, 1 (January 1977), 242–244. https://doi.org/10.1109/TMAG.1977.1059351

work page doi:10.1109/tmag.1977.1059351 1977
[25]

K. K. Likharev and V. K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (March 1991), 3–28. https://doi.org/10.1109/77.80745

work page doi:10.1109/77.80745 1991
[26]

Kathy J Liszka and Kenneth E Batcher. 1993. A generalized bitonic sorting network. In Parallel Processing, 1993. ICPP 1993. International Conference on, Vol. 1. IEEE, 105–108

work page 1993
[27]

Loe and E

K. Loe and E. Goto. 1985. Analysis of flux input and output Josephson pair device. IEEE Transactions on Magnetics 21, 2 (March 1985), 884–887. https://doi.org/10.1109/TMAG.1985.1063734

work page doi:10.1109/tmag.1985.1063734 1985
[28]

Pierre LâĂŹEcuyer. 2012. Random number generation. In Handbook of Computational Statistics. Springer, 35–71

work page 2012
[29]

Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A 11 unified template-based framework for accelerating statistical machine learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on . IEEE, 14–26

work page 2016
[30]

Bert Moons, Roel Uytterhoeven, Wim Dehaene, and Marian Verhelst

work page
[31]

In Solid-State Circuits Conference (ISSCC), 2017 IEEE International

14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic- voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International. IEEE, 246–247

work page 2017
[32]

Shuichi Nagasawa, Yoshihito Hashimoto, Hideaki Numata, and Shuichi Tahara. 1995. A 380 ps, 9.5 mW Josephson 4-Kbit RAM operated at a high bit yield. IEEE Transactions on Applied Superconductivity 5, 2 (1995), 2447–2452

work page 1995
[33]

Narama, F

T. Narama, F. China, N. Takeuchi, T. Ortlepp, Y. Yamanashi, and N. Yoshikawa. 2016. Yield evaluation of 83k-junction adiabatic-quantum- flux-parametron circuit. In 2016 Appl. Superconductivity Conference (ASC2016)

work page 2016
[34]

Harald Niederreiter. 1992. Random number generation and quasi-Monte Carlo methods. Vol. 63. Siam

work page 1992
[35]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . ACM, 26–35

work page 2016
[36]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu- Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267–278

work page 2016
[37]

Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. ACM SIGOPS Operating Systems Review 51, 2 (2017), 405–418

work page 2017
[38]

Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, and Bo Yuan. 2016. Designing reconfigurable large-scale deep learning systems using stochastic com- puting. In Rebooting Computing (ICRC), IEEE International Conference on. IEEE, 1–7

work page 2016
[39]

Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh

work page
[40]

In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on

From high-level deep neural models to FPGAs. In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12

work page 2016
[41]

Hyeonuk Sim and Jongeun Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In Proceedings of the 54th Annual Design Automation Conference 2017 . ACM, 29

work page 2017
[42]

Jaehyeong Sim, Jun-Seok Park, Minhye Kim, Dongmyung Bae, Yeong- jae Choi, and Lee-Sup Kim. 2016. 14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems. In Solid-State Circuits Conference (ISSCC), 2016 IEEE International . IEEE, 264–265

work page 2016
[43]

Mingcong Song, Kan Zhong, Jiaqi Zhang, Yang Hu, Duo Liu, Weigong Zhang, Jing Wang, and Tao Li. 2018. In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems. In High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on . IEEE, 92–103

work page 2018
[44]

Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput- optimized OpenCL-based FPGA accelerator for large-scale convolu- tional neural networks. In Proceedings of the 2016 ACM/SIGDA Interna- tional Symposium on Field-Programmable Gate Arrays . ACM, 16–25

work page 2016
[45]

Naoki Takeuchi, Shuichi Nagasawa, Fumihiro China, Takumi Ando, Mutsuo Hidaka, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2017. Adia- batic quantum-flux-parametron cell library designed using a 10 kA cm- 2 niobium fabrication process. Superconductor Science and Technology 30, 3 (2017), 035002

work page 2017
[46]

Naoki Takeuchi, Dan Ozawa, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. An adiabatic quantum flux parametron as an ultra- low-power logic device. Superconductor Science and Technology 26, 3 (2013), 035010

work page 2013
[47]

Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. Measurement of 10 zJ energy dissipation of adiabatic quantum-flux- parametron logic using a superconducting resonator. Applied Physics Letters 102, 5 (2013), 052602

work page 2013
[48]

Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2014. Energy efficiency of adiabatic superconductor logic. Superconductor Science and Technology 28, 1 (nov 2014), 015003. https://doi.org/10. 1088/0953-2048/28/1/015003

work page 2014
[49]

Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2015. Adiabatic quantum-flux-parametron cell library adopting minimalist design. Journal of Applied Physics 117, 17 (2015), 173912

work page 2015
[50]

Sergey K Tolpygo, Vladimir Bolkhovsky, Terence J Weir, Alex Wynn, Daniel E Oates, Leonard M Johnson, and Mark A Gouker. 2016. Ad- vanced fabrication processes for superconducting very large-scale integrated circuits. IEEE Transactions on Applied Superconductivity 26, 3 (2016), 1–10

work page 2016
[51]

Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays. ACM, 65–74

work page 2017
[52]

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evalu- ating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Arc...

work page 2017
[53]

Yanzhi Wang, Zheng Zhan, Jiayu Li, Jian Tang, Bo Yuan, Liang Zhao, Wujie Wen, Siyue Wang, and Xue Lin. 2018. On the Universal Approximation Property and Equivalence of Stochastic Computing- based Neural Networks and Binary Neural Networks. arXiv preprint arXiv:1803.05391 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[54]

Paul N Whatmough, Sae Kyu Lee, Hyunkwang Lee, Saketh Rama, David Brooks, and Gu-Yeon Wei. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for IoT applications. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International . IEEE, 242–243

work page 2017
[55]

Dongbin Xiu. 2010. Numerical methods for stochastic computations: a spectral method approach. Princeton university press

work page 2010
[56]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and accel- eration for deep convolutional neural networks. In Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on . IEEE, 1–8

work page 2016
[57]

Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Ja- son Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Sym- posium on Low Power Electronics and Design . ACM, 326–331

work page 2016
[58]

Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani B Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Ac- celerating Binarized Convolutional Neural Networks with Software- Programmable FPGAs.. In FPGA. 15–24. 12

work page 2017

[1] [1]

Armin Alaghi and John P Hayes. 2013. Survey of stochastic computing. ACM Transactions on Embedded computing systems (TECS) 12, 2s (2013), 92

work page 2013

[2] [2]

Armin Alaghi, Weikang Qian, and John P Hayes. 2018. The promise and challenge of stochastic computing. IEEE Transactions on Computer- Aided Design of Integrated Circuits and Systems 37, 8 (2018), 1515–1531

work page 2018

[3] [3]

Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, and Warren J Gross. 2017. VLSI implementation of deep neural network using integral stochastic computing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2688–2699

work page 2017

[4] [4]

Suyoung Bang, Jingcheng Wang, Ziyun Li, Cao Gao, Yejoong Kim, Qing Dong, Yen-Po Chen, Laura Fick, Xun Sun, Ron Dreslinski, Trevor Mudge, Hun Seok Kim, David Blaauw, and Dennis Sylvester. 2017. 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence. In Solid-State Circu...

work page 2017

[5] [5]

Bradley D Brown and Howard C Card. 2001. Stochastic neural compu- tation. I. Computational elements. IEEE Transactions on computers 50, 9 (2001), 891–905

work page 2001

[6] [6]

Ruizhe Cai, Ao Ren, Ning Liu, Caiwen Ding, Luhao Wang, Xuehai Qian, Massoud Pedram, and Yanzhi Wang. 2018. VIBNN: Hardware Acceleration of Bayesian Neural Networks. SIGPLAN Not. 53, 2 (March 2018), 476–488. https://doi.org/10.1145/3296957.3173212

work page doi:10.1145/3296957.3173212 2018

[7] [7]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM Sigplan Notices 49, 4 (2014), 269–284

work page 2014

[8] [8]

Yunji Chen, Tao Luo, Shaoli Liu, Shijin Zhang, Liqiang He, Jia Wang, Ling Li, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam

work page

[9] [9]

In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture

Dadiannao: A machine-learning supercomputer. In Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchi- tecture. IEEE Computer Society, 609–622

work page

[10] [10]

Yu-Hsin Chen, Tushar Krishna, Joel S Emer, and Vivienne Sze. 2017. Eyeriss: An energy-efficient reconfigurable accelerator for deep con- volutional neural networks. IEEE Journal of Solid-State Circuits 52, 1 (2017), 127–138

work page 2017

[11] [11]

John Clarke and Alex I Braginski. 2006. The SQUID handbook: Applica- tions of SQUIDs and SQUID systems . John Wiley & Sons

work page 2006

[12] [12]

Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv preprint arXiv:1602.02830 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[13] [13]

Giuseppe Desoli, Nitin Chawla, Thomas Boesch, Surinder-pal Singh, Elio Guidetti, Fabio De Ambroggi, Tommaso Majo, Paolo Zambotti, Manuj Ayodhyawasi, Harvinder Singh, and Nalin Aggarwal. 2017. 14.1 A 2.9 TOPS/W deep convolutional neural network SoC in FD-SOI 28nm for intelligent embedded systems. In Solid-State Circuits Conference (ISSCC), 2017 IEEE Intern...

work page 2017

[14] [14]

Zidong Du, Robert Fasthuber, Tianshi Chen, Paolo Ienne, Ling Li, Tao Luo, Xiaobing Feng, Yunji Chen, and Olivier Temam. 2015. ShiDianNao: Shifting vision processing closer to the sensor. In ACM SIGARCH Computer Architecture News, Vol. 43. ACM, 92–104

work page 2015

[15] [15]

Brian R Gaines. 1967. Stochastic computing. In Proceedings of the April 18-20, 1967, spring joint computer conference . ACM, 149–156

work page 1967

[16] [16]

Brian R Gaines. 1969. Stochastic computing systems. In Advances in information systems science. Springer, 37–172

work page 1969

[17] [17]

James E Gentle. 2006. Random number generation and Monte Carlo methods. Springer Science & Business Media

work page 2006

[18] [18]

Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, and William J. Dally. 2017. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA.. In FPGA. 75–84

work page 2017

[19] [19]

Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Proceedings of the 43rd Inter- national Symposium on Computer Architecture . IEEE Press, 243–254

work page 2016

[20] [20]

Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12

work page 2016

[21] [21]

Hyoukjun Kwon, Ananda Samajdar, and Tushar Krishna. 2018. MAERI: Enabling Flexible Dataflow Mapping over DNN Accelerators via Re- configurable Interconnects. In Proceedings of the Twenty-Third Interna- tional Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 461–475

work page 2018

[22] [22]

Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/. (2010). http://yann. lecun.com/exdb/mnist/

work page 2010

[23] [23]

Vincent T Lee, Armin Alaghi, John P Hayes, Visvesh Sathe, and Luis Ceze. 2017. Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing. In Proceedings of the Conference on De- sign, Automation & Test in Europe . European Design and Automation Association, 13–18

work page 2017

[24] [24]

Likharev

K. Likharev. 1977. Dynamics of some single flux quantum devices: I. Parametric quantron. IEEE Transactions on Magnetics 13, 1 (January 1977), 242–244. https://doi.org/10.1109/TMAG.1977.1059351

work page doi:10.1109/tmag.1977.1059351 1977

[25] [25]

K. K. Likharev and V. K. Semenov. 1991. RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital systems. IEEE Transactions on Applied Superconductivity 1, 1 (March 1991), 3–28. https://doi.org/10.1109/77.80745

work page doi:10.1109/77.80745 1991

[26] [26]

Kathy J Liszka and Kenneth E Batcher. 1993. A generalized bitonic sorting network. In Parallel Processing, 1993. ICPP 1993. International Conference on, Vol. 1. IEEE, 105–108

work page 1993

[27] [27]

Loe and E

K. Loe and E. Goto. 1985. Analysis of flux input and output Josephson pair device. IEEE Transactions on Magnetics 21, 2 (March 1985), 884–887. https://doi.org/10.1109/TMAG.1985.1063734

work page doi:10.1109/tmag.1985.1063734 1985

[28] [28]

Pierre LâĂŹEcuyer. 2012. Random number generation. In Handbook of Computational Statistics. Springer, 35–71

work page 2012

[29] [29]

Divya Mahajan, Jongse Park, Emmanuel Amaro, Hardik Sharma, Amir Yazdanbakhsh, Joon Kyung Kim, and Hadi Esmaeilzadeh. 2016. Tabla: A 11 unified template-based framework for accelerating statistical machine learning. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on . IEEE, 14–26

work page 2016

[30] [30]

Bert Moons, Roel Uytterhoeven, Wim Dehaene, and Marian Verhelst

work page

[31] [31]

In Solid-State Circuits Conference (ISSCC), 2017 IEEE International

14.5 Envision: A 0.26-to-10TOPS/W subword-parallel dynamic- voltage-accuracy-frequency-scalable Convolutional Neural Network processor in 28nm FDSOI. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International. IEEE, 246–247

work page 2017

[32] [32]

Shuichi Nagasawa, Yoshihito Hashimoto, Hideaki Numata, and Shuichi Tahara. 1995. A 380 ps, 9.5 mW Josephson 4-Kbit RAM operated at a high bit yield. IEEE Transactions on Applied Superconductivity 5, 2 (1995), 2447–2452

work page 1995

[33] [33]

Narama, F

T. Narama, F. China, N. Takeuchi, T. Ortlepp, Y. Yamanashi, and N. Yoshikawa. 2016. Yield evaluation of 83k-junction adiabatic-quantum- flux-parametron circuit. In 2016 Appl. Superconductivity Conference (ASC2016)

work page 2016

[34] [34]

Harald Niederreiter. 1992. Random number generation and quasi-Monte Carlo methods. Vol. 63. Siam

work page 1992

[35] [35]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded fpga platform for convolutional neural network. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays . ACM, 26–35

work page 2016

[36] [36]

Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu- Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE Press, 267–278

work page 2016

[37] [37]

Ao Ren, Zhe Li, Caiwen Ding, Qinru Qiu, Yanzhi Wang, Ji Li, Xuehai Qian, and Bo Yuan. 2017. Sc-dcnn: Highly-scalable deep convolutional neural network using stochastic computing. ACM SIGOPS Operating Systems Review 51, 2 (2017), 405–418

work page 2017

[38] [38]

Ao Ren, Zhe Li, Yanzhi Wang, Qinru Qiu, and Bo Yuan. 2016. Designing reconfigurable large-scale deep learning systems using stochastic com- puting. In Rebooting Computing (ICRC), IEEE International Conference on. IEEE, 1–7

work page 2016

[39] [39]

Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh

work page

[40] [40]

In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on

From high-level deep neural models to FPGAs. In Microarchitec- ture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12

work page 2016

[41] [41]

Hyeonuk Sim and Jongeun Lee. 2017. A new stochastic computing multiplier with application to deep convolutional neural networks. In Proceedings of the 54th Annual Design Automation Conference 2017 . ACM, 29

work page 2017

[42] [42]

Jaehyeong Sim, Jun-Seok Park, Minhye Kim, Dongmyung Bae, Yeong- jae Choi, and Lee-Sup Kim. 2016. 14.6 a 1.42 tops/w deep convolutional neural network recognition processor for intelligent ioe systems. In Solid-State Circuits Conference (ISSCC), 2016 IEEE International . IEEE, 264–265

work page 2016

[43] [43]

Mingcong Song, Kan Zhong, Jiaqi Zhang, Yang Hu, Duo Liu, Weigong Zhang, Jing Wang, and Tao Li. 2018. In-Situ AI: Towards Autonomous and Incremental Deep Learning for IoT Systems. In High Performance Computer Architecture (HPCA), 2018 IEEE International Symposium on . IEEE, 92–103

work page 2018

[44] [44]

Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput- optimized OpenCL-based FPGA accelerator for large-scale convolu- tional neural networks. In Proceedings of the 2016 ACM/SIGDA Interna- tional Symposium on Field-Programmable Gate Arrays . ACM, 16–25

work page 2016

[45] [45]

Naoki Takeuchi, Shuichi Nagasawa, Fumihiro China, Takumi Ando, Mutsuo Hidaka, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2017. Adia- batic quantum-flux-parametron cell library designed using a 10 kA cm- 2 niobium fabrication process. Superconductor Science and Technology 30, 3 (2017), 035002

work page 2017

[46] [46]

Naoki Takeuchi, Dan Ozawa, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. An adiabatic quantum flux parametron as an ultra- low-power logic device. Superconductor Science and Technology 26, 3 (2013), 035010

work page 2013

[47] [47]

Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2013. Measurement of 10 zJ energy dissipation of adiabatic quantum-flux- parametron logic using a superconducting resonator. Applied Physics Letters 102, 5 (2013), 052602

work page 2013

[48] [48]

Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2014. Energy efficiency of adiabatic superconductor logic. Superconductor Science and Technology 28, 1 (nov 2014), 015003. https://doi.org/10. 1088/0953-2048/28/1/015003

work page 2014

[49] [49]

Naoki Takeuchi, Yuki Yamanashi, and Nobuyuki Yoshikawa. 2015. Adiabatic quantum-flux-parametron cell library adopting minimalist design. Journal of Applied Physics 117, 17 (2015), 173912

work page 2015

[50] [50]

Sergey K Tolpygo, Vladimir Bolkhovsky, Terence J Weir, Alex Wynn, Daniel E Oates, Leonard M Johnson, and Mark A Gouker. 2016. Ad- vanced fabrication processes for superconducting very large-scale integrated circuits. IEEE Transactions on Applied Superconductivity 26, 3 (2016), 1–10

work page 2016

[51] [51]

Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. Finn: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays. ACM, 65–74

work page 2017

[52] [52]

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evalu- ating Deep Networks. In Proceedings of the 44th Annual International Symposium on Computer Arc...

work page 2017

[53] [53]

Yanzhi Wang, Zheng Zhan, Jiayu Li, Jian Tang, Bo Yuan, Liang Zhao, Wujie Wen, Siyue Wang, and Xue Lin. 2018. On the Universal Approximation Property and Equivalence of Stochastic Computing- based Neural Networks and Binary Neural Networks. arXiv preprint arXiv:1803.05391 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[54] [54]

Paul N Whatmough, Sae Kyu Lee, Hyunkwang Lee, Saketh Rama, David Brooks, and Gu-Yeon Wei. 2017. 14.3 A 28nm SoC with a 1.2 GHz 568nJ/prediction sparse deep-neural-network engine with> 0.1 timing error rate tolerance for IoT applications. In Solid-State Circuits Conference (ISSCC), 2017 IEEE International . IEEE, 242–243

work page 2017

[55] [55]

Dongbin Xiu. 2010. Numerical methods for stochastic computations: a spectral method approach. Princeton university press

work page 2010

[56] [56]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed representation and accel- eration for deep convolutional neural networks. In Computer-Aided Design (ICCAD), 2016 IEEE/ACM International Conference on . IEEE, 1–8

work page 2016

[57] [57]

Chen Zhang, Di Wu, Jiayu Sun, Guangyu Sun, Guojie Luo, and Ja- son Cong. 2016. Energy-efficient CNN implementation on a deeply pipelined FPGA cluster. In Proceedings of the 2016 International Sym- posium on Low Power Electronics and Design . ACM, 326–331

work page 2016

[58] [58]

Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani B Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Ac- celerating Binarized Convolutional Neural Networks with Software- Programmable FPGAs.. In FPGA. 15–24. 12

work page 2017