Neural Network Training with Approximate Logarithmic Computations

Arnab Sanyal; Keith M. Chugg; Peter A. Beerel

arxiv: 1910.09876 · v1 · submitted 2019-10-22 · 💻 cs.LG · stat.ML

Neural Network Training with Approximate Logarithmic Computations

Arnab Sanyal , Peter A. Beerel , Keith M. Chugg This is my paper

Pith reviewed 2026-05-24 16:15 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords neural network traininglogarithmic computationsfixed-point arithmeticapproximate computingmultiplication-freeedge devicesdeep learning

0 comments

The pith

Approximate log-domain arithmetic lets neural networks train end-to-end in fixed-point with accuracy within 1% of floating-point.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that the full training procedure for deep neural networks, including forward and backward passes, can be carried out using only approximate operations in the logarithmic domain and fixed-point numbers. This replaces all multiplications with additions after taking logs and uses simple table lookups plus bit shifts for the necessary additions. A reader would care because the approach removes the need for floating-point hardware and could make on-device training practical on edge hardware with far lower power and area cost. Experiments show the 16-bit version stays within roughly one percent accuracy of standard floating-point training on several common classification datasets.

Core claim

An end-to-end training and inference scheme implemented entirely in the log domain with fixed-point representations and hardware-friendly approximations of log-domain addition (based on look-up tables and bit-shifts) achieves classification accuracy within approximately 1% of equivalent floating-point baselines on commonly used datasets.

What carries the argument

Hardware-friendly approximation of log-domain addition via look-up tables and bit-shifts, applied throughout the entire training procedure to eliminate multiplications.

If this is right

Multiplications are removed from both forward and backward passes.
The entire procedure runs with fixed-point data representations.
Hardware implementation complexity drops because only additions, shifts, and table lookups remain.
Online and real-time training on edge devices becomes more feasible.
Classification accuracy stays within about 1% of floating-point results on standard datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same log-domain replacement could be tested on regression or reinforcement-learning tasks to see whether the 1% tolerance holds outside classification.
Hardware designers could now build accelerators that contain only log-addition units instead of full multipliers.
If the approximation tables are made task-specific, further reductions in bit width might remain accurate.

Load-bearing premise

The specific hardware-friendly approximations to log-domain addition preserve sufficient gradient information and training dynamics to reach convergence comparable to floating-point training.

What would settle it

Training a standard network such as a CNN on MNIST or CIFAR-10 with the 16-bit log-domain method and measuring a classification accuracy drop larger than 1% relative to the matching floating-point run would falsify the central claim.

read the original abstract

The high computational complexity associated with training deep neural networks limits online and real-time training on edge devices. This paper proposed an end-to-end training and inference scheme that eliminates multiplications by approximate operations in the log-domain which has the potential to significantly reduce implementation complexity. We implement the entire training procedure in the log-domain, with fixed-point data representations. This training procedure is inspired by hardware-friendly approximations of log-domain addition which are based on look-up tables and bit-shifts. We show that our 16-bit log-based training can achieve classification accuracy within approximately 1% of the equivalent floating-point baselines for a number of commonly used datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an end-to-end training and inference scheme for deep neural networks performed entirely in the log-domain with fixed-point representations. Log-domain addition is approximated via hardware-friendly look-up tables and bit-shifts to eliminate multiplications. The central empirical claim is that 16-bit log-based training reaches classification accuracy within approximately 1% of equivalent floating-point baselines on standard datasets.

Significance. If the approximations preserve gradient information and training dynamics as claimed, the approach could enable multiplication-free training on edge devices, addressing a key barrier to online and real-time learning. The emphasis on fixed-point and LUT/bit-shift approximations aligns with practical hardware constraints and provides a concrete empirical validation path.

major comments (2)

[Abstract] Abstract: the central claim that 16-bit log-based training achieves accuracy 'within approximately 1%' of floating-point baselines supplies no details on the exact approximation functions, datasets, network architectures, training hyperparameters, or statistical significance. This information is load-bearing for evaluating whether the approximations preserve sufficient gradient information.
[Method] The description of back-propagation under the approximate log-domain addition (likely in the method section) must explicitly show how the approximations affect gradient computation; without this, it is unclear whether the reported accuracy is due to preserved dynamics or other factors.

minor comments (1)

Clarify the fixed-point bit widths and LUT sizes used in the 16-bit implementation to support reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive recommendation of minor revision and the helpful comments. We address each point below and will update the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 16-bit log-based training achieves accuracy 'within approximately 1%' of floating-point baselines supplies no details on the exact approximation functions, datasets, network architectures, training hyperparameters, or statistical significance. This information is load-bearing for evaluating whether the approximations preserve sufficient gradient information.

Authors: We agree the abstract is concise and omits specifics. The manuscript details the LUT/bit-shift approximations in Section 3, the datasets (MNIST, CIFAR-10), architectures, and hyperparameters in Section 4. We will revise the abstract to incorporate key elements of these (datasets and approximation approach) while preserving length. We will also add reporting of multiple runs with standard deviations to address statistical significance. revision: yes
Referee: [Method] The description of back-propagation under the approximate log-domain addition (likely in the method section) must explicitly show how the approximations affect gradient computation; without this, it is unclear whether the reported accuracy is due to preserved dynamics or other factors.

Authors: Section 3 describes the full log-domain training procedure, with forward and backward passes both using the approximate addition. The approximations are constructed to remain differentiable so that gradients can be computed via the chain rule through the log-add operation. To make the effect on gradients fully explicit, we will add a short derivation subsection showing the gradient expression under the LUT/bit-shift approximation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical validation only

full rationale

The paper presents an empirical method for log-domain neural network training using hardware-friendly approximations (LUTs and bit-shifts) to log-domain addition, with the central claim being that 16-bit fixed-point log training reaches accuracy within ~1% of FP32 baselines on standard datasets. No derivation chain, first-principles prediction, or fitted parameter is present; the argument consists of implementation description followed by direct experimental comparison. No self-citations, self-definitional steps, or renamings of known results are load-bearing. The result is self-contained against external benchmarks (FP baselines) and does not reduce to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The approach rests on the standard mathematical identity that multiplication becomes addition under logarithm, plus domain assumptions about acceptable approximation error in neural network training. No free parameters or invented entities are described in the abstract.

axioms (2)

standard math Logarithm turns multiplication into addition
Fundamental property invoked to eliminate multiplications in the training procedure.
domain assumption Approximate log-domain addition via LUTs and shifts is sufficiently accurate for gradient-based optimization
Assumption required for the training procedure to converge to usable models.

pith-pipeline@v0.9.0 · 5634 in / 1139 out tokens · 23176 ms · 2026-05-24T16:15:57.542299+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 7 internal anchors

[1]

Neural Network Training with Approximate Logarithmic Computations

INTRODUCTION In recent years neural networks with hidden layers, or deep neural networks (DNNs), have found widespread application in a large number of pattern recognition problems, notably speech recognition and computer vision [1]. This resurgence in interest and application of neural networks has been driven by the availability of large data sets and i...

work page internal anchor Pith review Pith/arXiv arXiv 1910
[2]

Experimental results are summarized in Section 5 and conclusions provided in Section 6

Section 4 contains a description of the end-to-end training scheme of a neural network in log-domain as well as analysis relating bit-widths for ﬁxed-point processing in the linear and log domains. Experimental results are summarized in Section 5 and conclusions provided in Section 6

work page
[3]

Thus, v ← →V = (V,sv) (1a) V = log2 (|v|) (1b) sv = sign(v) (1c) where sign(v) = 1 ifv > 0 and 0 otherwise

LOGARITHMIC NUMBER SYSTEM In a LNS, a real number v is represented by the logarithm of its absolute value and its sign. Thus, v ← →V = (V,sv) (1a) V = log2 (|v|) (1b) sv = sign(v) (1c) where sign(v) = 1 ifv > 0 and 0 otherwise. Note that the radix of the logarithm does not change the important proper- ties of LNS, but using radix2 leads to bit-shift appro...

work page
[4]

APPROXIMA TE LOG-DOMAIN ADDITION It is clear from (2) that LNS processing reduces the complex- ity of multiplication, but the ∆ terms in (3) associated with log-domain addition are much more complex to implement than standard addition. Motivated by the fact that the training process is inherently noisy (e.g., gradient noise, ﬁnite pre- cision effects, etc...

work page
[5]

LOG-DOMAIN DNN TRAINING Much of the computation associated with the feedforward and backpropagation operations are based on matrix multiplica- tion. These can be implemented directly using the operations in Sections 2-3 zi = ∑ j wi,jxj +bi ← →Zi =⊞ j Wi,j⊡Xj⊞Bi (10) In this section we describe log-domain versions of the other signiﬁcant operations in the ...

work page
[6]

Stochastic gradient descent was used with mini-batch size of 5 and learning rate of 0.01

NUMERICAL EXPERIMENTS The neural network trained is an MLP with one input layer of 784 neurons, one hidden layer of 100 neurons, and one soft- max layer with number of neurons equal to the number of classes for the given dataset. Stochastic gradient descent was used with mini-batch size of 5 and learning rate of 0.01. The weight decay regularization const...

work page
[7]

CONCLUSIONS Our results demonstrate that all training and inference pro- cessing associated with a neural network can be performed using logarithmic number system with approximate log- domain additions, thus allowing a hardware implementation without multipliers. In particular, approximating the log- domain addition using a max(·), add, and an approximati...

work page
[8]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, MIT press, 2016

work page 2016
[9]

PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile de- vices,

Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang, “PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile de- vices,” http://arxiv.org/abs/1909.05073, 2019

work page arXiv 1909
[10]

Pre- deﬁned sparse neural networks with hardware accelera- tion,

S. Dey, K. Huang, P. A. Beerel, and K. M. Chugg, “Pre- deﬁned sparse neural networks with hardware accelera- tion,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019

work page 2019
[11]

NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference

Mahdi Nazemi, Ghasem Pasandi, and Massoud Pedram, “Nullanet: Training deep neural networks for reduced- memory-access inference,” CoRR, vol. abs/1807.08716, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[12]

Accelerating training of deep neural networks via sparse edge processing,

Sourya Dey, Yinan Shao, Keith M. Chugg, and Peter A. Beerel, “Accelerating training of deep neural networks via sparse edge processing,” in Artiﬁcial Neural Net- works and Machine Learning – ICANN 2017, 2017

work page 2017
[13]

A Pre-deﬁned Sparse Kernel Based Convolution for Deep CNNs,

Souvik Kundu, Saurav Prakash, Haleh Akrami, Pe- ter A. Beerel, and Keith M. Chugg, “A Pre-deﬁned Sparse Kernel Based Convolution for Deep CNNs,” http://arxiv.org/abs/1910.00724, 2019

work page arXiv 1910
[14]

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen, “Incremental network quantization: To- wards lossless cnns with low-precision weights,” ArXiv preprint arXiv:1702.03044, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Addendum to ‘the focus number system’,

S. C. Lee and A. D. Edgar, “Addendum to ‘the focus number system’,” IEEE Transactions on Computers , 1979

work page 1979
[16]

The sign/logarithm number system,

E. E. Swartzlander and A. G. Alexopoulos, “The sign/logarithm number system,” IEEE Transactions on Computers, 1975

work page 1975
[17]

A comparison of optimal and sub-optimal map decoding algorithms operating in the log domain,

Patrick Robertson, Emmanuelle Villebrun, Peter Hoe- her, et al., “A comparison of optimal and sub-optimal map decoding algorithms operating in the log domain,” in IEEE International Conference on Communications , 1995

work page 1995
[18]

ROM-less LNS,

R. C. Ismail and J. N. Coleman, “ROM-less LNS,” in 2011 IEEE 20th Symposium on Computer Arithmetic , 2011

work page 2011
[19]

Comparing ﬂoating- point and logarithmic number representations for re- conﬁgurable acceleration,

H. Fu, O. Mencer, and W. Luk, “Comparing ﬂoating- point and logarithmic number representations for re- conﬁgurable acceleration,” in 2006 IEEE International Conference on Field Programmable Technology, 2006

work page 2006
[20]

Digital ﬁltering using logarithmic arithmetic,

N. G. Kingsbury and P. J. W. Rayner, “Digital ﬁltering using logarithmic arithmetic,”Electronics Letters, 1971

work page 1971
[21]

On the cost effectiveness of logarithmic arith- metic for backpropagation training on SIMD proces- sors,

M. G. Arnold, T. A. Bailey, J. J. Cupal, and M. D. Winkel, “On the cost effectiveness of logarithmic arith- metic for backpropagation training on SIMD proces- sors,” in Proceedings of International Conference on Neural Networks (ICNN’97), 1997

work page 1997
[22]

Imple- menting back propagation neural nets with logarithmic arithmetic,

M. Arnold, J. Cowles T. Bailey, and J. Cupal, “Imple- menting back propagation neural nets with logarithmic arithmetic,” International AMSE conference on Neural Nets, San Diego, 1991

work page 1991
[23]

Lognet: Energy-efﬁcient neural networks using logarithmic computation,

E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “Lognet: Energy-efﬁcient neural networks using logarithmic computation,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp. 5900–5904

work page 2017
[24]

Convolutional Neural Networks using Logarithmic Data Representation

Daisuke Miyashita, Edward H. Lee, and Boris Murmann, “Convolutional neural net- works using logarithmic data representation,” http://arxiv.org/abs/1603.01025, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

Rethinking floating point for deep learning

Jeff Johnson, “Rethinking ﬂoating point for deep learn- ing,” CoRR, vol. abs/1811.01721, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[26]

Beating ﬂoating point at its own game: Posit arithmetic,

John Gustafson and Isaac Yonemoto, “Beating ﬂoating point at its own game: Posit arithmetic,” Supercomput- ing Frontiers and Innovations, vol. 4, no. 2, 2017

work page 2017
[27]

Delving deep into rectiﬁers: Surpassing human-level performance on ima- genet classiﬁcation,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers: Surpassing human-level performance on ima- genet classiﬁcation,” in 2015 IEEE International Con- ference on Computer Vision (ICCV), 2015

work page 2015
[28]

Deep Neural Networks multi-layer perceptron implementation using Logarithmic Number System,

“Deep Neural Networks multi-layer perceptron implementation using Logarithmic Number System,” https://github.com/usc-hal/lnsdnn.git

work page
[29]

Gradient-based learning applied to doc- ument recognition,

Yann LeCun, L ´eon Bottou, Yoshua Bengio, Patrick Haffner, et al., “Gradient-based learning applied to doc- ument recognition,” Proceedings of the IEEE, 1998

work page 1998
[30]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf, “Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

EMNIST: an extension of MNIST to handwritten letters

Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andr´e van Schaik, “EMNIST: an extension of MNIST to handwritten letters,” arXiv preprint arXiv:1702.05373, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[1] [1]

Neural Network Training with Approximate Logarithmic Computations

INTRODUCTION In recent years neural networks with hidden layers, or deep neural networks (DNNs), have found widespread application in a large number of pattern recognition problems, notably speech recognition and computer vision [1]. This resurgence in interest and application of neural networks has been driven by the availability of large data sets and i...

work page internal anchor Pith review Pith/arXiv arXiv 1910

[2] [2]

Experimental results are summarized in Section 5 and conclusions provided in Section 6

Section 4 contains a description of the end-to-end training scheme of a neural network in log-domain as well as analysis relating bit-widths for ﬁxed-point processing in the linear and log domains. Experimental results are summarized in Section 5 and conclusions provided in Section 6

work page

[3] [3]

Thus, v ← →V = (V,sv) (1a) V = log2 (|v|) (1b) sv = sign(v) (1c) where sign(v) = 1 ifv > 0 and 0 otherwise

LOGARITHMIC NUMBER SYSTEM In a LNS, a real number v is represented by the logarithm of its absolute value and its sign. Thus, v ← →V = (V,sv) (1a) V = log2 (|v|) (1b) sv = sign(v) (1c) where sign(v) = 1 ifv > 0 and 0 otherwise. Note that the radix of the logarithm does not change the important proper- ties of LNS, but using radix2 leads to bit-shift appro...

work page

[4] [4]

APPROXIMA TE LOG-DOMAIN ADDITION It is clear from (2) that LNS processing reduces the complex- ity of multiplication, but the ∆ terms in (3) associated with log-domain addition are much more complex to implement than standard addition. Motivated by the fact that the training process is inherently noisy (e.g., gradient noise, ﬁnite pre- cision effects, etc...

work page

[5] [5]

LOG-DOMAIN DNN TRAINING Much of the computation associated with the feedforward and backpropagation operations are based on matrix multiplica- tion. These can be implemented directly using the operations in Sections 2-3 zi = ∑ j wi,jxj +bi ← →Zi =⊞ j Wi,j⊡Xj⊞Bi (10) In this section we describe log-domain versions of the other signiﬁcant operations in the ...

work page

[6] [6]

Stochastic gradient descent was used with mini-batch size of 5 and learning rate of 0.01

NUMERICAL EXPERIMENTS The neural network trained is an MLP with one input layer of 784 neurons, one hidden layer of 100 neurons, and one soft- max layer with number of neurons equal to the number of classes for the given dataset. Stochastic gradient descent was used with mini-batch size of 5 and learning rate of 0.01. The weight decay regularization const...

work page

[7] [7]

CONCLUSIONS Our results demonstrate that all training and inference pro- cessing associated with a neural network can be performed using logarithmic number system with approximate log- domain additions, thus allowing a hardware implementation without multipliers. In particular, approximating the log- domain addition using a max(·), add, and an approximati...

work page

[8] [8]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep learning, MIT press, 2016

work page 2016

[9] [9]

PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile de- vices,

Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang, “PCONV: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile de- vices,” http://arxiv.org/abs/1909.05073, 2019

work page arXiv 1909

[10] [10]

Pre- deﬁned sparse neural networks with hardware accelera- tion,

S. Dey, K. Huang, P. A. Beerel, and K. M. Chugg, “Pre- deﬁned sparse neural networks with hardware accelera- tion,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019

work page 2019

[11] [11]

NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference

Mahdi Nazemi, Ghasem Pasandi, and Massoud Pedram, “Nullanet: Training deep neural networks for reduced- memory-access inference,” CoRR, vol. abs/1807.08716, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[12] [12]

Accelerating training of deep neural networks via sparse edge processing,

Sourya Dey, Yinan Shao, Keith M. Chugg, and Peter A. Beerel, “Accelerating training of deep neural networks via sparse edge processing,” in Artiﬁcial Neural Net- works and Machine Learning – ICANN 2017, 2017

work page 2017

[13] [13]

A Pre-deﬁned Sparse Kernel Based Convolution for Deep CNNs,

Souvik Kundu, Saurav Prakash, Haleh Akrami, Pe- ter A. Beerel, and Keith M. Chugg, “A Pre-deﬁned Sparse Kernel Based Convolution for Deep CNNs,” http://arxiv.org/abs/1910.00724, 2019

work page arXiv 1910

[14] [14]

Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights

Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen, “Incremental network quantization: To- wards lossless cnns with low-precision weights,” ArXiv preprint arXiv:1702.03044, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[15] [15]

Addendum to ‘the focus number system’,

S. C. Lee and A. D. Edgar, “Addendum to ‘the focus number system’,” IEEE Transactions on Computers , 1979

work page 1979

[16] [16]

The sign/logarithm number system,

E. E. Swartzlander and A. G. Alexopoulos, “The sign/logarithm number system,” IEEE Transactions on Computers, 1975

work page 1975

[17] [17]

A comparison of optimal and sub-optimal map decoding algorithms operating in the log domain,

Patrick Robertson, Emmanuelle Villebrun, Peter Hoe- her, et al., “A comparison of optimal and sub-optimal map decoding algorithms operating in the log domain,” in IEEE International Conference on Communications , 1995

work page 1995

[18] [18]

ROM-less LNS,

R. C. Ismail and J. N. Coleman, “ROM-less LNS,” in 2011 IEEE 20th Symposium on Computer Arithmetic , 2011

work page 2011

[19] [19]

Comparing ﬂoating- point and logarithmic number representations for re- conﬁgurable acceleration,

H. Fu, O. Mencer, and W. Luk, “Comparing ﬂoating- point and logarithmic number representations for re- conﬁgurable acceleration,” in 2006 IEEE International Conference on Field Programmable Technology, 2006

work page 2006

[20] [20]

Digital ﬁltering using logarithmic arithmetic,

N. G. Kingsbury and P. J. W. Rayner, “Digital ﬁltering using logarithmic arithmetic,”Electronics Letters, 1971

work page 1971

[21] [21]

On the cost effectiveness of logarithmic arith- metic for backpropagation training on SIMD proces- sors,

M. G. Arnold, T. A. Bailey, J. J. Cupal, and M. D. Winkel, “On the cost effectiveness of logarithmic arith- metic for backpropagation training on SIMD proces- sors,” in Proceedings of International Conference on Neural Networks (ICNN’97), 1997

work page 1997

[22] [22]

Imple- menting back propagation neural nets with logarithmic arithmetic,

M. Arnold, J. Cowles T. Bailey, and J. Cupal, “Imple- menting back propagation neural nets with logarithmic arithmetic,” International AMSE conference on Neural Nets, San Diego, 1991

work page 1991

[23] [23]

Lognet: Energy-efﬁcient neural networks using logarithmic computation,

E. H. Lee, D. Miyashita, E. Chai, B. Murmann, and S. S. Wong, “Lognet: Energy-efﬁcient neural networks using logarithmic computation,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 2017, pp. 5900–5904

work page 2017

[24] [24]

Convolutional Neural Networks using Logarithmic Data Representation

Daisuke Miyashita, Edward H. Lee, and Boris Murmann, “Convolutional neural net- works using logarithmic data representation,” http://arxiv.org/abs/1603.01025, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[25] [25]

Rethinking floating point for deep learning

Jeff Johnson, “Rethinking ﬂoating point for deep learn- ing,” CoRR, vol. abs/1811.01721, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[26] [26]

Beating ﬂoating point at its own game: Posit arithmetic,

John Gustafson and Isaac Yonemoto, “Beating ﬂoating point at its own game: Posit arithmetic,” Supercomput- ing Frontiers and Innovations, vol. 4, no. 2, 2017

work page 2017

[27] [27]

Delving deep into rectiﬁers: Surpassing human-level performance on ima- genet classiﬁcation,

K. He, X. Zhang, S. Ren, and J. Sun, “Delving deep into rectiﬁers: Surpassing human-level performance on ima- genet classiﬁcation,” in 2015 IEEE International Con- ference on Computer Vision (ICCV), 2015

work page 2015

[28] [28]

Deep Neural Networks multi-layer perceptron implementation using Logarithmic Number System,

“Deep Neural Networks multi-layer perceptron implementation using Logarithmic Number System,” https://github.com/usc-hal/lnsdnn.git

work page

[29] [29]

Gradient-based learning applied to doc- ument recognition,

Yann LeCun, L ´eon Bottou, Yoshua Bengio, Patrick Haffner, et al., “Gradient-based learning applied to doc- ument recognition,” Proceedings of the IEEE, 1998

work page 1998

[30] [30]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf, “Fashion- mnist: a novel image dataset for benchmarking machine learning algorithms,” arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

EMNIST: an extension of MNIST to handwritten letters

Gregory Cohen, Saeed Afshar, Jonathan Tapson, and Andr´e van Schaik, “EMNIST: an extension of MNIST to handwritten letters,” arXiv preprint arXiv:1702.05373, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017