pith. sign in

arxiv: 1412.6553 · v3 · pith:6AMI5LN2new · submitted 2014-12-19 · 💻 cs.CV · cs.LG

Speeding-up Convolutional Neural Networks Using Fine-tuned CP-Decomposition

classification 💻 cs.CV cs.LG
keywords approachconvolutionalconvolutionlayernetworksaccuracyclassificationcost
0
0 comments X
read the original abstract

We propose a simple two-step approach for speeding up convolution layers within large convolutional neural networks based on tensor decomposition and discriminative fine-tuning. Given a layer, we use non-linear least squares to compute a low-rank CP-decomposition of the 4D convolution kernel tensor into a sum of a small number of rank-one tensors. At the second step, this decomposition is used to replace the original convolutional layer with a sequence of four convolutional layers with small kernels. After such replacement, the entire network is fine-tuned on the training data using standard backpropagation process. We evaluate this approach on two CNNs and show that it is competitive with previous approaches, leading to higher obtained CPU speedups at the cost of lower accuracy drops for the smaller of the two networks. Thus, for the 36-class character classification CNN, our approach obtains a 8.5x CPU speedup of the whole network with only minor accuracy drop (1% from 91% to 90%). For the standard ImageNet architecture (AlexNet), the approach speeds up the second convolution layer by a factor of 4x at the cost of $1\%$ increase of the overall top-5 classification error.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications

    cs.CV 2017-04 accept novelty 7.0

    MobileNets introduce depthwise separable convolutions plus width and resolution multipliers to produce efficient CNNs that trade off latency and accuracy for mobile and embedded vision applications.

  2. ASVD: Activation-aware Singular Value Decomposition for Compressing Large Language Models

    cs.CL 2023-12 unverdicted novelty 6.0

    ASVD compresses LLMs by 10-30% and KV caches by 50% via activation-aware SVD that absorbs outliers into transformed weights and calibrates per-layer sensitivity.

  3. Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

    cs.LG 2026-05 unverdicted novelty 5.0

    A slice-wise feature distillation framework for independent tensorization of neural network slices to achieve scalable compression with reduced fine-tuning costs.

  4. Vanishing Contributions: A Unified Framework for Smooth and Iterative Model Compression

    cs.LG 2025-10 unverdicted novelty 5.0

    VCON is a unified framework for smooth iterative DNN compression that uses parallel execution and an affine combination to progressively replace the original model with its compressed form during fine-tuning.

  5. Tucker Tensor Decomposition on FPGA

    eess.SP 2019-06 unverdicted novelty 5.0

    FPGA accelerator for Tucker decomposition reports 2.16-30.2x speedup versus CPU/GPU toolboxes on cardiac MRI data via fixed-point design and warm-start SVD.