Learning to Learn without Gradient Descent by Gradient Descent

Matt Botvinick; Matthew W. Hoffman; Misha Denil; Nando de Freitas; Sergio Gomez Colmenarejo; Timothy P. Lillicrap; Yutian Chen

arxiv: 1611.03824 · v6 · pith:X7YRPS7Znew · submitted 2016-11-11 · 📊 stat.ML · cs.LG

Learning to Learn without Gradient Descent by Gradient Descent

Yutian Chen , Matthew W. Hoffman , Sergio Gomez Colmenarejo , Misha Denil , Timothy P. Lillicrap , Matt Botvinick , Nando de Freitas This is my paper

classification 📊 stat.ML cs.LG

keywords descentgradientlearnoptimizersfunctionshyper-parameterlearnedoptimization

0 comments

read the original abstract

We learn recurrent neural network optimizers trained on simple synthetic functions by gradient descent. We show that these learned optimizers exhibit a remarkable degree of transfer in that they can be used to efficiently optimize a broad range of derivative-free black-box functions, including Gaussian process bandits, simple control objectives, global optimization benchmarks and hyper-parameter tuning tasks. Up to the training horizon, the learned optimizers learn to trade-off exploration and exploitation, and compare favourably with heavily engineered Bayesian optimization packages for hyper-parameter tuning.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning to learn with quantum neural networks via classical neural networks
quant-ph 2019-07 unverdicted novelty 7.0

Classical RNNs trained on small instances provide parameter initializations for QAOA and VQE that reduce total optimization iterations and generalize across problem sizes.