Training vision models with full-batch gradient descent and regularization

Last update: Jan 06, 2023

Related tags

Overview

Stochastic Training is Not Necessary for Generalization -- Training competitive vision models without stochasticity

This repository implements training routines to train vision models for classification on CIFAR-10 without SGD as described in our publication https://arxiv.org/abs/2109.14119.

Abstract:

It is widely believed that the implicit regularization of SGD is fundamental to the impressive generalization behavior we observe in neural networks. In this work, we demonstrate that non-stochastic full-batch training can achieve comparably strong performance to SGD on CIFAR-10 using modern architectures. To this end, we show that the implicit regularization of SGD can be completely replaced with explicit regularization. Our observations indicate that the perceived difficulty of full-batch training is largely the result of its optimization properties and the disproportionate time and effort spent by the ML community tuning optimizers and hyperparameters for small-batch training.

Requirements

PyTorch 1.9.*
Hydra 1.* (via pip install hydra-core --upgrade)
python-lmdb (only for N x CIFAR experiments.)

Pytorch 1.9.* is used for multithreaded persistent workers, _for_each_, functionality, and torch.inference_mode, all of which have to be replaced when using earlier versions.

Training Runs

Scripts for training runs can be found in train.sh

Guide

The main script for fullbatch training should be train_with_gradient_descent.py. This scripts also runs the stochastic gradient descent sanity check with the same codebase. A crucial flag to distinguish between both settings is hyp.train_stochastic=False. The gradient regularization is activated by setting hyp.grad_reg.block_strength=0.5. Have a look at the various folders under config to see all options. The script crunch_loss_landscape.py can measure the loss landscape of checkpointed models which can be used later for visualization.

If you are interesting in extracting components from this repository, you can use the gradient regularization separately, which can be found under fullbatch/models/modules.py in the class GradRegularizer. This regularizer can be instantiated like so:

gradreg = GradRegularizer(model, optimizer, loss_fn, block_strength=0.5, acc_strength=0.0, eps=1e-2,
                          implementation='finite_diff', mixed_precision=False)

and then used during training to modify a list of gradients (such as from torch.autograd.grad) in-place:

grads = gradreg(grads, inputs, labels, pre_grads=None)

Contact

Just open an issue here on github or send us an email if you have any questions.

You might also like...

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks (SDPoint) This repository contains the cod

17 Jul 4, 2022

[ICLR 2021] Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization Kaidi Cao, Yining Chen, Junwei Lu, Nikos Arechiga, Adrien Gaidon, Tengyu Ma

29 Oct 20, 2022

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks. Code, based on the PyTorch framework, for reprodu

3 Dec 27, 2022

Consistency Regularization for Adversarial Robustness

Consistency Regularization for Adversarial Robustness Official PyTorch implementation of Consistency Regularization for Adversarial Robustness by Jiho

40 Dec 17, 2022

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

CoMatch: Semi-supervised Learning with Contrastive Graph Regularization (Salesforce Research) This is a PyTorch implementation of the CoMatch paper [B

107 Dec 14, 2022

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

56 Dec 28, 2022

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

SelfReg PyTorch official implementation of Self-supervised Contrastive Regularization for Domain Generalization (SelfReg, https://arxiv.org/abs/2104.0

64 Dec 16, 2022

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

42 Dec 9, 2022

Training vision models with full-batch gradient descent and regularization

Related tags

Overview

Stochastic Training is Not Necessary for Generalization -- Training competitive vision models without stochasticity

Abstract:

Requirements

Training Runs

Guide

Contact

You might also like...

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks

[ICLR 2021] Heteroskedastic and Imbalanced Deep Learning with Adaptive Regularization

PyTorch framework, for reproducing experiments from the paper Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

Consistency Regularization for Adversarial Robustness

Code for CoMatch: Semi-supervised Learning with Contrastive Graph Regularization

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

PyTorch implementation of Self-supervised Contrastive Regularization for DG (SelfReg)

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Releases(v1)

v1(Dec 15, 2021)

Owner

Jonas Geiping

这是一个facenet-pytorch的库，可以用于训练自己的人脸识别模型。

ParaGen is a PyTorch deep learning framework for parallel sequence generation

Keep CALM and Improve Visual Feature Attribution

Fake videos detection by tracing the source using video hashing retrieval.

Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

Unofficial pytorch-lightning implement of Mip-NeRF

Adjusting for Autocorrelated Errors in Neural Networks for Time Series

Reviatalizing Optimization for 3D Human Pose and Shape Estimation: A Sparse Constrained Formulation

Unofficial PyTorch implementation of SimCLR by Google Brain

Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Source Code for DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances (https://arxiv.org/pdf/2012.01775.pdf)

Pytorch implementation of the DeepDream computer vision algorithm

Official re-implementation of the Calibrated Adversarial Refinement model described in the paper Calibrated Adversarial Refinement for Stochastic Semantic Segmentation

A quantum game modeling of pandemic (QHack 2022)

Official PyTorch implementation of N-ImageNet: Towards Robust, Fine-Grained Object Recognition with Event Cameras (ICCV 2021)

Supervised Classification from Text (P)

[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

PyTorch implementation of the Flow Gaussian Mixture Model (FlowGMM) model from our paper

Much faster than SORT(Simple Online and Realtime Tracking), a little worse than SORT