The schedule.step() should be called outside the dataloader loop. #15

wangguanan · 2023-12-18T16:39:11Z

Thanks to the OpenAI and Superalignment Generalization Team's awesome work.

When I reading the code of vision part, I found a minor bug about CosineAnnealingLR. Since the learning rate schedule is set by n_epochs not n_iters,

schedule = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer=optimizer, T_max=n_epochs)

the schedule.step() should be called outside train_loader loop, corespondingly:

    for epoch in (pbar := tqdm.tqdm(range(n_epochs), desc="Epoch 0")):
        correct, total = 0, 0
        for x, y in train_loader:
            x, y = x.cuda(), y.cuda()
            optimizer.zero_grad()
            pred = model(x)
            loss = criterion(pred, y)
            loss.backward()
            optimizer.step()
            schedule.step() # <-- remove
            if len(y.shape) > 1:
                y = torch.argmax(y, dim=1)
            correct += (torch.argmax(pred, -1) == y).detach().float().sum().item()
            total += len(y)
        schedule.step() # <-- add
        pbar.set_description(f"Epoch {epoch}, Train Acc {correct / total:.3f}")

After fixing the logic, the final results should be like this:

Model	Top-1 Accuracy	Top-1 Acc (schedule outside)
AlexNet	56.6	-
Dino ResNet50	63.7	-
Dino ViT-B/8	74.9	-
AlexNet → DINO ResNet50	60.7	61.9 (+1.2)
AlexNet → DINO ViT-B/8	64.2	67.1 (+2.9)

The text was updated successfully, but these errors were encountered:

WuTheFWasThat · 2023-12-18T20:06:06Z

@pavel-izmailov

wangguanan · 2023-12-19T00:21:44Z

One more thing, using multiple processings can significantly increase data loading speed, i.e. reduce training and inference time, which can be implemented by setting num_workers > 0, correspondingly:

here

def get_imagenet(datapath, split, batch_size, shuffle, transform=TRANSFORM):
    ds = torchvision.datasets.ImageNet(root=datapath, split=split, transform=transform)
    loader = torch.utils.data.DataLoader(ds, shuffle=shuffle, batch_size=batch_size, num_workers=min(batch_size//16, 8)) # <-- add num_workers=min(batch_size//16, 8)
    return ds, loader

here

train_loader = torch.utils.data.DataLoader(train_ds, shuffle=True, batch_size=batch_size, num_workers=min(batch_size//16, 8)) # <-- add num_workers=min(batch_size//16, 8)

WuTheFWasThat · 2023-12-19T18:08:00Z

that's great, feel free to make PRs!

wangguanan changed the title ~~Since the CosineAnnealingLR is set with n_epochs, the schedule.step() should be called outside the dataloader loop.~~ The schedule.step() should be called outside the dataloader loop. Dec 18, 2023

pavel-izmailov mentioned this issue Dec 18, 2023

Pavel/vision #16

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The schedule.step() should be called outside the dataloader loop. #15

The schedule.step() should be called outside the dataloader loop. #15

wangguanan commented Dec 18, 2023

WuTheFWasThat commented Dec 18, 2023

wangguanan commented Dec 19, 2023

WuTheFWasThat commented Dec 19, 2023

The schedule.step() should be called outside the dataloader loop. #15

The schedule.step() should be called outside the dataloader loop. #15

Comments

wangguanan commented Dec 18, 2023

WuTheFWasThat commented Dec 18, 2023

wangguanan commented Dec 19, 2023

WuTheFWasThat commented Dec 19, 2023