SMASH: One-Shot Model Architecture Search through HyperNetworks

Andrew Brock; J.M. Ritchie; Nick Weston; Theodore Lim

arxiv: 1708.05344 · v1 · pith:IN6KYS6Lnew · submitted 2017-08-17 · 💻 cs.LG

SMASH: One-Shot Model Architecture Search through HyperNetworks

Andrew Brock , Theodore Lim , J.M. Ritchie , Nick Weston This is my paper

classification 💻 cs.LG

keywords architecturemodelnetworkssearchsmasharchitecturesperformancerange

0 comments

read the original abstract

Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks. Our code is available at https://github.com/ajbrock/SMASH

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RELO: Reinforcement Learning to Localize for Visual Object Tracking
cs.CV 2026-05 unverdicted novelty 6.0

RELO replaces handcrafted spatial priors with a reinforcement learning policy for target localization in visual tracking and reports 57.5% AUC on LaSOText without template updates.
RELO: Reinforcement Learning to Localize for Visual Object Tracking
cs.CV 2026-05 unverdicted novelty 6.0

RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 5...
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data
cs.LG 2025-02 unverdicted novelty 6.0

TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datase...
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
cs.CL 2024-02 conditional novelty 6.0

DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.
Representation-Aligned Multi-Scale Personalization for Federated Learning
cs.LG 2026-04 unverdicted novelty 5.0

FRAMP generates client-specific models from compact descriptors in federated learning, trains tailored submodels, and aligns representations to balance personalization with global consistency.
Spiking Neural Network Architecture Search: A Survey
cs.NE 2025-10 unverdicted novelty 2.0

A survey of Spiking Neural Network architecture search techniques viewed through a hardware/software co-design lens.