pith. sign in

arxiv: 1708.05344 · v1 · pith:IN6KYS6Lnew · submitted 2017-08-17 · 💻 cs.LG

SMASH: One-Shot Model Architecture Search through HyperNetworks

classification 💻 cs.LG
keywords architecturemodelnetworkssearchsmasharchitecturesperformancerange
0
0 comments X
read the original abstract

Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main model conditioned on that model's architecture. By comparing the relative validation performance of networks with HyperNet-generated weights, we can effectively search over a wide range of architectures at the cost of a single training run. To facilitate this search, we develop a flexible mechanism based on memory read-writes that allows us to define a wide range of network connectivity patterns, with ResNet, DenseNet, and FractalNet blocks as special cases. We validate our method (SMASH) on CIFAR-10 and CIFAR-100, STL-10, ModelNet10, and Imagenet32x32, achieving competitive performance with similarly-sized hand-designed networks. Our code is available at https://github.com/ajbrock/SMASH

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. RELO: Reinforcement Learning to Localize for Visual Object Tracking

    cs.CV 2026-05 unverdicted novelty 6.0

    RELO replaces handcrafted spatial priors with a reinforcement learning policy for target localization in visual tracking and reports 57.5% AUC on LaSOText without template updates.

  2. RELO: Reinforcement Learning to Localize for Visual Object Tracking

    cs.CV 2026-05 unverdicted novelty 6.0

    RELO formulates visual object tracking localization as a Markov decision process solved by reinforcement learning with combined IoU and AUC rewards, augmented by layer-aligned temporal token propagation, and reports 5...

  3. TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

    cs.LG 2025-02 unverdicted novelty 6.0

    TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datase...

  4. Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

    cs.CL 2024-02 conditional novelty 6.0

    DPOP is a new loss function that prevents DPO from lowering preferred response likelihoods and outperforms standard DPO on diverse datasets, MT-Bench, and enables Smaug-72B to exceed 80% on the Open LLM Leaderboard.

  5. Representation-Aligned Multi-Scale Personalization for Federated Learning

    cs.LG 2026-04 unverdicted novelty 5.0

    FRAMP generates client-specific models from compact descriptors in federated learning, trains tailored submodels, and aligns representations to balance personalization with global consistency.

  6. Spiking Neural Network Architecture Search: A Survey

    cs.NE 2025-10 unverdicted novelty 2.0

    A survey of Spiking Neural Network architecture search techniques viewed through a hardware/software co-design lens.