Learning to Learn without Gradient Descent by Gradient Descent

Yutian Chen , Matthew W. Hoffman , Sergio Gomez Colmenarejo , Misha Denil , Timothy P. Lillicrap , Matt Botvinick , Nando de Freitas

Authors on Pith no claims yet

classification 📊 stat.ML cs.LG

keywords descentgradientlearnoptimizersfunctionshyper-parameterlearnedoptimization

0 comments

read the original abstract

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

This paper has not been read by Pith yet.

Learning to Learn without Gradient Descent by Gradient Descent

discussion (0)