VeLO: Training Versatile Learned Optimizers by Scaling Up
read the original abstract
While deep learning models have replaced hand-designed features across many domains, these models are still trained with hand-designed optimizers. In this work, we leverage the same scaling approach behind the success of deep learning to learn versatile optimizers. We train an optimizer for deep learning which is itself a small neural network that ingests gradients and outputs parameter updates. Meta-trained with approximately four thousand TPU-months of compute on a wide variety of optimization tasks, our optimizer not only exhibits compelling performance, but optimizes in interesting and unexpected ways. It requires no hyperparameter tuning, instead automatically adapting to the specifics of the problem being optimized. We open source our learned optimizer, meta-training code, the associated train and test data, and an extensive optimizer benchmark suite with baselines at velo-code.github.io.
This paper has not been read by Pith yet.
Forward citations
Cited by 6 Pith papers
-
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
The AI Scientist framework enables LLMs to independently conduct the full scientific process from idea generation to paper writing and review, demonstrated across three ML subfields with papers costing under $15 each.
-
FlowBender: Feedback-Aware Training for Self-Correcting Conditional Flows
FlowBender introduces closed-loop training that lets conditional flow models learn correction policies from their own task-specific alignment errors, outperforming supervised and guidance baselines on fidelity and pla...
-
Learn2Splat: Extending the Horizon of Learned 3DGS Optimization
A meta-learned optimizer for 3DGS that extends the optimization horizon via checkpoint buffers and latent gradient-scale encoding, delivering better early novel-view quality and long-term stability with zero-shot gene...
-
Quasi-Equivariant Metanetworks
Quasi-equivariant metanetworks relax strict equivariance to preserve functional identity in weight-space learning while improving expressivity for feedforward, convolutional, and transformer networks.
-
Dynamic Neural Graph Encoding of Inference Processes in Deep Weight Space
DNG-Encoder represents NN weights as dynamic graphs to preserve sequential inference and powers INR2JLS, which raises INR classification accuracy by ~10% on CIFAR-100-INR.
-
AI Training Manager: Bounded Closed-Loop Control of Adaptive Training Recipes
An LLM-based bounded controller adapts ML training parameters from structured telemetry to correct overfitting and exploration issues, shown on TinyStories and robotic RL tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.