Learning to Learn without Gradient Descent by Gradient Descent
read the original abstract
We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
Learning to learn with quantum neural networks via classical neural networks
Classical RNNs trained on small instances provide parameter initializations for QAOA and VQE that reduce total optimization iterations and generalize across problem sizes.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.