Slimmable neural networks

Jiahui Yu, Linjie Yang, Ning Xu, Jianchao Yang, Thomas Huang · 2018 · cs.CV · arXiv 1812.08928

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

open full Pith review browse 5 citing papers arXiv PDF

abstract

We present a simple and general method to train a single neural network executable at different widths (number of channels in a layer), permitting instant and adaptive accuracy-efficiency trade-offs at runtime. Instead of training individual networks with different width configurations, we train a shared network with switchable batch normalization. At runtime, the network can adjust its width on the fly according to on-device benchmarks and resource constraints, rather than downloading and offloading different models. Our trained networks, named slimmable neural networks, achieve similar (and in many cases better) ImageNet classification accuracy than individually trained models of MobileNet v1, MobileNet v2, ShuffleNet and ResNet-50 at different widths respectively. We also demonstrate better performance of slimmable models compared with individual ones across a wide range of applications including COCO bounding-box object detection, instance segmentation and person keypoint detection without tuning hyper-parameters. Lastly we visualize and discuss the learned features of slimmable networks. Code and models are available at: https://github.com/JiahuiYu/slimmable_networks

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.

NaviSlim: Adaptive Context-Aware Navigation and Sensing via Dynamic Slimmable Networks

cs.RO · 2024-05-16 · unverdicted · novelty 6.0

NaviSlim uses a gated slimmable architecture to dynamically scale neural model complexity and onboard sensor power for context-aware navigation in micro-drones, reporting 57-92% average model reduction and 61-80% sensor utilization in AirSim simulations versus static full-complexity baselines.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

Full-prefix Matryoshka Representation Learning recovers ordered principal directions in the linear case and yields consistent per-dimension task-aligned structure.

CADENCE: Context-Adaptive Depth Estimation for Navigation and Computational Efficiency

cs.RO · 2026-04-08 · unverdicted · novelty 4.0

CADENCE dynamically adjusts a slimmable depth estimation network's computational load according to context, cutting energy expenditure by 75% and boosting navigation accuracy by 7.43% versus static baselines.

citing papers explorer

Showing 5 of 5 citing papers.

MatryoshkaLoRA: Learning Accurate Hierarchical Low-Rank Representations for LLM Fine-Tuning cs.CL · 2026-05-08 · unverdicted · none · ref 21
MatryoshkaLoRA inserts a crafted diagonal matrix P into LoRA to learn accurate nested low-rank adapters that support dynamic rank selection with minimal performance drop.
NaviSlim: Adaptive Context-Aware Navigation and Sensing via Dynamic Slimmable Networks cs.RO · 2024-05-16 · unverdicted · none · ref 9 · internal anchor
NaviSlim uses a gated slimmable architecture to dynamically scale neural model complexity and onboard sensor power for context-aware navigation in micro-drones, reporting 57-92% average model reduction and 61-80% sensor utilization in AirSim simulations versus static full-complexity baselines.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 30
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.
Objective-Specific Privileged Bases via Full-Prefix Matryoshka Learning cs.LG · 2026-05-09 · unverdicted · none · ref 5
Full-prefix Matryoshka Representation Learning recovers ordered principal directions in the linear case and yields consistent per-dimension task-aligned structure.
CADENCE: Context-Adaptive Depth Estimation for Navigation and Computational Efficiency cs.RO · 2026-04-08 · unverdicted · none · ref 12
CADENCE dynamically adjusts a slimmable depth estimation network's computational load according to context, cutting energy expenditure by 75% and boosting navigation accuracy by 7.43% versus static baselines.

Slimmable neural networks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer