Description

VeLO is a learned optimizer meta-trained on thousands of diverse machine learning tasks. It corresponds to the VeLO (Versatile Learned Optimizer) from VeLO: Training Versatile Learned Optimizers by Scaling Up.

Learned optimizer meta training and architectural details

Field	Value
Meta-training distribution	Thousands of ML tasks including MLPs, CNNs, ResNets, VAEs, classification, regression
Number of meta-training TPU-months	~4000
Target inner problem length	150000 (max)
Gradient estimator	Evolution Strategies
Architecture	LSTM-based hypernetwork

Usage

1) Install PyLO

The following

git clone https://github.com/Belilovsky-Lab/pylo
cd pylo
pip install .
python setup.py install --cuda

(2) Use VeLO as a drop-in replacement for pytorch optimizers


from pylo.optim import VeLO
optimizer = VeLO(model.parameters(), lr=1.0 , num_steps=150_000)

(3) A simple example

The following example is for illustration purposes and does not implement the correct parameterizaiton. For a correct implementation see https://github.com/Belilovsky-Lab/pylo/tree/main/examples

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

# Model
class MLP(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Flatten(),
            nn.Linear(28 * 28, 128),
            nn.ReLU(),
            nn.Linear(128, 10)
        )
    def forward(self, x):
        return self.net(x)

model = MLP().to(device)

#########################
Setup Learned Optimizer
#########################
optimizer = VeLO(model.parameters(), lr=1.0 , num_steps=150_000)

# Device
device = torch.device('cuda') 

# Data
transform = transforms.ToTensor()
train_loader = DataLoader(datasets.MNIST(root='./data', train=True, download=True, transform=transform),
                          batch_size=64, shuffle=True)

criterion = nn.CrossEntropyLoss()
# Training loop
for epoch in range(1):  # Just 1 epoch for simplicity
    for x, y in train_loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        loss = criterion(model(x), y)
        loss.backward()
        optimizer.step(loss)

print("Done!")

Note: VeLO requires the total number of training steps as input to initialize its internal states and compute training progress features.

Official Resources

Paper: VeLO: Training Versatile Learned Optimizers by Scaling Up
Adapted from Repository: Google Learned Optimization
Available in PyLO [PyLO] (https://github.com/Belilovsky-Lab/pylo)

Important Notes

Step Requirement: VeLO requires knowledge of total training steps for proper initialization
Computational Cost: Meta-trained using ~4000 TPU-months, representing significant computational investment
Generalization: Designed for meta-generalization across diverse optimization tasks
No Hyperparameter Tuning: Automatically adapts to problem specifics without manual tuning

Cite

If you found this optimizer useful in your research, please consider citing the original work:

@article{metz2022velo,
  title={{VeLO}: Training Versatile Learned Optimizers by Scaling Up},
  author={Luke Metz and James Harrison and C. Daniel Freeman and Amil Merchant and Lucas Beyer and James Bradbury and Naman Agrawal and Ben Poole and Igor Mordatch and Adam Roberts and Jascha Sohl-Dickstein},
  journal={arXiv preprint arXiv:2211.09760},
  year={2022},
  url={https://arxiv.org/abs/2211.09760}
}

Downloads last month: 3,452

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support