Recognition: unknown
Stochastic Hyperparameter Optimization through Hypernetworks
read the original abstract
Machine learning models are often tuned by nesting optimization of model weights inside the optimization of hyperparameters. We give a method to collapse this nested optimization into joint stochastic optimization of weights and hyperparameters. Our process trains a neural network to output approximately optimal weights as a function of hyperparameters. We show that our technique converges to locally optimal weights and hyperparameters for sufficiently large hypernetworks. We compare this method to standard hyperparameter optimization strategies and demonstrate its effectiveness for tuning thousands of hyperparameters.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
On the Stability and Generalization of First-order Bilevel Minimax Optimization
Provides the first systematic generalization analysis via algorithmic stability for single-timescale and two-timescale stochastic gradient descent-ascent in bilevel minimax problems.
-
Fine-grained Analysis of Stability and Generalization for Stochastic Bilevel Optimization
Derives upper bounds on on-average argument stability for single- and two-timescale SGD in bilevel optimization under NC-NC, C-C, and SC-SC regimes, linking stability directly to generalization gaps.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.