VeLO: Training Versatile Learned Optimizers by Scaling Up

Adam Roberts; Amil Merchant; Ben Poole; C. Daniel Freeman; Igor Mordatch; James Bradbury; James Harrison; Jascha Sohl-Dickstein; Lucas Beyer; Luke Metz

arxiv: 2211.09760 · v1 · pith:IX2NZSVRnew · submitted 2022-11-17 · 💻 cs.LG · math.OC· stat.ML

VeLO: Training Versatile Learned Optimizers by Scaling Up

Luke Metz , James Harrison , C. Daniel Freeman , Amil Merchant , Lucas Beyer , James Bradbury , Naman Agrawal , Ben Poole

show 3 more authors

Igor Mordatch Adam Roberts Jascha Sohl-Dickstein

This is my paper

classification 💻 cs.LG math.OCstat.ML

keywords optimizerdeeplearningoptimizershand-designedlearnedmodelsscaling

0 comments

read the original abstract

While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
cs.AI 2024-08 unverdicted novelty 8.0

The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows
cs.CV 2026-06 unverdicted novelty 7.0

FlowBender introduces closed-loop training that lets conditional flow models learn correction policies from their own task-specific alignment errors, outperforming supervised and guidance baselines on fidelity and pla...
Learn2Splat: Extending the Horizon of Learned 3DGS Optimization
cs.CV 2026-05 unverdicted novelty 7.0

A meta-learned optimizer for 3DGS that extends the optimization horizon via checkpoint buffers and latent gradient-scale encoding, delivering better early novel-view quality and long-term stability with zero-shot gene...
Quasi-Equivariant Metanetworks
cs.LG 2026-04 unverdicted novelty 7.0

Quasi-equivariant metanetworks relax strict equivariance to preserve functional identity in weight-space learning while improving expressivity for feedforward, convolutional, and transformer networks.
Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space
cs.LG 2026-07 unverdicted novelty 5.0

DNG-Encoder represents NN weights as dynamic graphs to preserve sequential inference and powers INR2JLS, which raises INR classification accuracy by ~10% on CIFAR-100-INR.
AI Training Manager: Bounded Closed-Loop Control of Adaptive Training Recipes
cs.AI 2026-06 unverdicted novelty 4.0

An LLM-based bounded controller adapts ML training parameters from structured telemetry to correct overfitting and exploration issues, shown on TinyStories and robotic RL tasks.