FTerViT introduces fully ternary Vision Transformers with TernaryBitConv2d and TernaryLayerNorm operators, achieving 82.43% ImageNet top-1 at 6.09 MB with 15x compression.
hub
and Bengio, Y
16 Pith papers cite this work. Polarity classification is still indexing.
abstract
We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs we conduct two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Proposes first stochastic-computing DNN acceleration framework tailored to AQFP superconducting technology.
Introduces the first heterogeneous multi-source mmWave point cloud HAR dataset and DAP-Net architecture with Doppler reparameterization and text alignment for cross-source robustness.
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
A progressive training scheme with binary-aware initialization and dual-scaling allows pre-trained LLMs to be converted to high-performance 1-bit models without training from scratch.
Proves polynomial-in-width and exponential-in-depth lower bounds on linear regions for ternary ReLU regression networks, with width-doubling constructions achieving bounds comparable to unrestricted ReLU networks.
A co-evolutionary compression technique reduces parameters and FLOPs in unpaired image-to-image translation GAN generators while maintaining translation quality on benchmarks.
Neural networks are trained as timing models of programs and analyzed via MILP to detect and quantify timing side-channel information leaks.
Radix-5 memristor crossbar CNN accelerator reaches 90.5% CIFAR-10 accuracy with 46% area reduction by using variable memristor counts and single-column signed weights.
LBLLM achieves better accuracy than prior binarization methods for LLMs by decoupling weight and activation quantization through initialization, layer-wise distillation, and learnable activation scaling.
Replacing pointwise convolutions with DWHT yields a model with 79.1% fewer parameters, 48.4% fewer FLOPs, and 1.49% higher accuracy than MobileNet-V1 on CIFAR-100.
CNNs applied to global history improve prediction accuracy for hard-to-predict branches in SPEC 2017, with hardware-adapted inference and reusability across inputs.
HTAF is a sigmoid-tanh composite that approximates the Heaviside function to allow stable gradient training of binary activation networks, yielding ICBMs with stable discretization and competitive performance on image tasks.
A BNN-based YOLOv3-tiny-like object detector with 1-bit weights and 8-bit activations is implemented in Verilog on FPGA, achieving 39.6% mAP50 on VOC and 0.999964 correlation with the ONNX model in RTL simulation.
A comprehensive review of deep learning techniques for computational mechanics, including LSTM for constitutive modeling, PINNs for PDE solving, optimizers, and kernel methods.
citing papers explorer
-
DAP: Doppler-aware Point Network for Heterogeneous mmWave Action Recognition
Introduces the first heterogeneous multi-source mmWave point cloud HAR dataset and DAP-Net architecture with Doppler reparameterization and text alignment for cross-source robustness.
-
Design and Implementation of BNN-Based Object Detection on FPGA
A BNN-based YOLOv3-tiny-like object detector with 1-bit weights and 8-bit activations is implemented in Verilog on FPGA, achieving 39.6% mAP50 on VOC and 0.999964 correlation with the ONNX model in RTL simulation.