Supporting code for the Neograd algorithm

Last update: May 01, 2022

Related tags

Overview

Neograd

This repo supports the paper Neograd: Gradient Descent with a Near-Ideal Learning Rate, which introduces the algorithm "Neograd". The paper and associated code are by Michael F. Zimmer. It's been submitted to JMLR.

Getting Started

Download the code. Paths within the program are relative.

Prerequisites

Python 3
Jupyter notebook

Installing

Unzip/clone the repo. You should see this directory structure:
neograd/
libs/
notebooks/
figs/
The meaning of these names is self-explanatory. Only the name "notebooks" can be changed without interfering with the paths.

Running Notebooks

After cd-ing into the "notebooks" directory, open a notebook in Jupyter and execute the cells. If you choose to uncomment certain lines (the save fig command) a figure will be saved for you. Some of these are the same figs that appear in the aforementioned paper.

Descriptions of notebooks

These experiment notebooks contain evaluations of algorithms against the named cost fcn
EXPT_2Dshell
EXPT_Beale
EXPT_double
EXPT_quartic
EXPT_sigmoid-well

Additionally, these contain additional tests.
EXPT_hybrid
EXPT_manual
EXPT_momentum

Descriptions of libraries

algos_vec
Functions that are central to the GD family and Neograd family.

common
Functions for rho, alpha, and functions for tracking results of a run.

common_vec
Functions used by algos_vec, which aren't central to the algorithms. Also, these functions have a specific assumption that the "parameter vector" is a numpy array.

costgrad_vec
This is an aggregation of all the functions needed to compute the cost and gradient of the specific cost functions examined in the paper.

params
Contains all global parameters (not to be confused with the parameter vector that is being optimized). Also present is a function to return a "good choice" of alpha for each algorithm-cost function combination, as determined by trial and error.

plotting
The plotting functions are passed the dictionaries of results returned by the optimization runs

A few details

"p" represents the parameter vector in the repo; note this differs from "theta" which is used in the paper.

Statistics during the run are accumulated by a dictionary of lists. The keys in the dictionary contain the name of the statistic, and the "values" are lists. Before entering the main loop, the names/keys must be declared; this is done in the function "init_results". After each iteration, a list will have a value appended to it; this is done in the function "update_results". Both of these functions are in the "common" library.

If you set the total iteration number ("num") too high, you may find you get underflow errors plus their ramifications. This is because the Neograd algorithm will drive the error down to be so small, it bumps up against machine precision. There are a number of sophisticated ways to handle this, but for the purposes here it is enough to simply stop the optimization before it becomes an issue.

In the code on github, this alternative definition of rho may be used. Simply change the parameter "g_rhotype" to "original", instead of "new". This is discussed in an appendix of the paper.

Author

Michael F. Zimmer

License

This project is licensed under the MIT license.

Supporting code for the Neograd algorithm

Related tags

Overview

Neograd

Getting Started

Prerequisites

Installing

Running Notebooks

Descriptions of notebooks

Descriptions of libraries

A few details

Author

License

Owner

Michael Zimmer

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Convolutional 2D Knowledge Graph Embeddings resources

SmallInitEmb - LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Multimodal Co-Attention Transformer (MCAT) for Survival Prediction in Gigapixel Whole Slide Images

This is a five-step framework for the development of intrusion detection systems (IDS) using machine learning (ML) considering model realization, and performance evaluation.

An image processing project uses Viola-jones technique to detect faces and then use SIFT algorithm for recognition.

PyTorch code for our paper "Image Super-Resolution with Non-Local Sparse Attention" (CVPR2021).

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Paper: De-rendering Stylized Texts

GAN-based Matrix Factorization for Recommender Systems

The Python3 import playground

Source code for paper "Deep Superpixel-based Network for Blind Image Quality Assessment"

A collection of implementations of deep domain adaptation algorithms

Translate darknet to tensorflow. Load trained weights, retrain/fine-tune using tensorflow, export constant graph def to mobile devices

Repository for "Exploring Sparsity in Image Super-Resolution for Efficient Inference", CVPR 2021

Mememoji - A facial expression classification system that recognizes 6 basic emotions: happy, sad, surprise, fear, anger and neutral.

This code is for eCaReNet: explainable Cancer Relapse Prediction Network.

Image Captioning using CNN ,LSTM and Attention

Intelligent Video Analytics toolkit based on different inference backends.