Repository for DNN training, theory to practice, part of the Large Scale Machine Learning class at Mines Paritech

Overview

DNN Training, from theory to practice

This repository is complementary to the deep learning training lesson given to les Mines ParisTech on the 11th of March 2022 as part of the Large Scale Machine Learning class.

You can find here the slides of the class.

Requirements

To get started, clone it and prepare a new virtual env.

git clone https://github.com/adefossez/dnn_theo_practice
cd dnn_theo_practice
python3 -m venv env
source env/bin/activate
python3 -m pip install -r requirements.txt

Note: it can be safer to install PyTorch through a conda environment to make sure all proper versions of CUDA realted libraries are installed and used. We use pip here for simplicity.

Basic training pipeline

To get started, you can run

python -m basic.train

You can tweak some hyper parameters:

python -m basic.train --lr 0.1 --epochs 30 --model mobilenet_v2

This basic pipeline provides all the essential tools for training a neural network:

  • automatic experiment naming,
  • logging and metric dumping,
  • checkpointing with automatic resume.

Looking at basic/train.py you will see that 90% of the code is not deep learning but pure engineering. Some frameworks like PyTorch Lightning can save you some of this trouble, at the cost of losing control and understanding over what happens. In any case it is good to have an idea of how things work under the hood!

PyTorch-Lightning training pipeline

Insite the pl_hydra folder, I provide the same pipeline, but using PyTorch-Lightning along with Hydra, as an alternative to argparse. Have a look at pl_hydra/train.py to see the differences with the previous implementation.

python -m pl_hydra.train optim.lr=0.1 model=mobilenet_v2

Using existing frameworks:

At this point, it is a good time to introduce a few frameworks you might want to use for your projects.

Hydra

Hydra handles things like logging, configuration parsing (based on YAML files, which is a bit nicer than argparse, especially for large projects), and also has support for some grid search scheduling with a dedicated language. It also supports meta-optimizers like Nevergrad (see after).

Nevergrad

Nevergrad is a framework for gradient free optimization. It can be used to automatically tune your model or optimization hyper-parameters with smart random search.

PyTorch-Lightning

PyTorch Lightning takes care of logging, distributed training, checkpointing and many more boilerplate parts of a deep learning research project. It is powerful but also quite complex, and you will lose some control over the training pipeline.

Dora

Dora is an experiment management framework:

  • Grid searches are expressed as pure python.
  • Experiments have an automatic signature assigned based on its args.
  • Keeps in sync experiments defined in grid files, and those running on the cluster.
  • Basic terminal based reporting of job states, metrics etc.

Dora allows you to scale up to hundreds of experiments without losing your sanity.

Plotting and monitoring utilities

While it is always good to have basic metric reporting inside logs, it can be more conveniant to track experimental progress through a web browser. TensorBoard, initially developed for TensorFlow provide just that. A fully hosted alternative is Wandb. Finally, HiPlot is a lightweight package to easily make sense of the impact of hyperparameters on the metrics of interest.

Unix tools

It is a good idea to learn to master the standard Unix/Linux tools! For large scale machine learning, you will often have to run experiments on a remote cluster, with only SSH access. tmux is a must have, as well as knowing at least of one terminal based file editor (nano is the simplest, emacs or vim are more complex but quite powerful). Take some time to learn about tuning your bashrc, setting up aliases for often used commands etc.

You will probably need tools like grep, less, find or ack. I personnaly really enjoy fd, an alternative to find with some intuitive interface. Similarly ag is a nice way to quickly look through a codebase in the terminal. If you need to go through a lot of logs, you will enjoy ripgreg.

License

This code in this repository is released into the public domain. You can freely reuse any part of it and you don't even need to say where you found it! See the LICENSE for more information.

The slides are released under Creative Commons CC-BY-NC.

Owner
Alexandre Défossez
Alexandre Défossez
CuraMultiplyByGrid - Cura Плагин для размножения детали сеткой на весь стол автоматически без поворота

CuraMultiplyByGrid Cura Плагин для размножения детали сеткой на весь стол автоматически без поворота. Размножение в куре настолько ужасно реализовано,

3 Dec 02, 2022
Library support get vocabulary from MEM

Features: Support scraping the courses in MEM to take the vocabulary Translate the words to your own language Get the IPA for the English course Insta

Joseph Quang 4 Aug 13, 2022
Software that extracts spreadsheets from various .pdf files to .csv

Extração de planilhas de diversos arquivos .pdf para .csv O código inteiro foi desenvolvido em Python. Foi utilizado o pacote "tabula" e a biblioteca

Marcos Silva 2 Jan 09, 2022
Zeus is an open source flight intellingence tool which supports more than 13,000+ airlines and 250+ countries.

Zeus Zeus is an open source flight intellingence tool which supports more than 13,000+ airlines and 250+ countries. Any flight worldwide, at your fing

DeVickey 1 Oct 22, 2021
In this project, we'll be creating a virtual personal assistant for ourselves using our favorite programming language

In this project, we'll be creating a virtual personal assistant for ourselves using our favorite programming language, Python. We can perform several offline as well as online operations using the bo

Ashutosh Krishna 188 Jan 03, 2023
✔️ Create to-do lists to easily manage your ideas and work.

Todo List + Add task + Remove task + List completed task + List not completed task + Set clock task time + View task statistics by date Changelog v 1.

Abbas Ataei 30 Nov 28, 2022
Extend the maya channel box with searchability and colour

channel-box-plus will add search-ability over its attributes, and it will colour user defined attributes, making them easier to distinguish.

Robert Joosten 12 Jun 08, 2022
A Python module for decorators, wrappers and monkey patching.

wrapt The aim of the wrapt module is to provide a transparent object proxy for Python, which can be used as the basis for the construction of function

Graham Dumpleton 1.8k Jan 06, 2023
Python project setup, updater, and launcher

Launcher Python project setup, updater, and launcher Purpose: Increase project productivity and provide features easily. Once installed as a git submo

DAAV, LLC 1 Jan 07, 2022
Print 'text color' and 'text format' on Term with Python

term-printer Print 'text color' and 'text format' on Term with Python ※ It may not work depending on the OS and shell used. PIP $ pip install term-pri

ななといつ 10 Nov 12, 2022
Make discord server By Coding!

Discord Server Maker Make discord server by Coding! FAQ How can i get role permissons? Open discord with chrome developer tool, go to network and clic

1 Jul 17, 2022
Aesthetic NFT Generator

A E S T H E T I C Dependencies Pillow numpy OpenCV You can use pip to install any missing dependencies. Basic Usage Vaporwave artwork can be generated

Mentor Elezi 4 Mar 13, 2022
Modelling and Implementation of Cable Driven Parallel Manipulator System with Tension Control

Cable Driven Parallel Robots (CDPR) is also known as Cable-Suspended Robots are the emerging and flexible end effector manipulation system. Cable-driven parallel robots (CDPRs) are categorized as a t

Siddharth U 0 Jul 19, 2022
dynamically create __slots__ objects with less code

slots_factory Factory functions and decorators for creating slot objects Slots are a python construct that allows users to create an object that doesn

Michael Green 2 Sep 07, 2021
A pypi package details search python module

A pypi package details search python module

Fayas Noushad 5 Nov 30, 2021
This is a pretty basic but relatively nice looking Python Pomodoro Timer.

Python Pomodoro-Timer This is a pretty basic but relatively nice looking Pomodoro Timer. Currently its set to a very basic mode, but the funcationalit

EmmHarris 2 Oct 18, 2021
Simple dotfile pre-processor with a per-file configuration

ix (eeks) Simple dotfile pre-processor with a per-file configuration Summary (TL;DR) ix.py is all you need config is an ini file. files to be processe

Poly 12 Dec 16, 2021
A (hopefully) considerably copious collection of classical cipher crackers

ClassicalCipherCracker A (hopefully) considerably copious collection of classical cipher crackers Written in Python3 (and run with PyPy) TODOs Write a

Stanley Zhong 2 Feb 22, 2022
TimeWizard - A script that generates every single Time Wizard EDOPRO lflist possible

EDOPRO F&L list generator This project is just a script that generates every sin

Diamond Dude 2 Sep 28, 2022
Mommas-cookbook - A Repository About Mom's Recipes

Mommas Cookbook A Repository for Mom's Recipes Contents bacalhau à Gomes de Sá Beef-Rendang bacalhau à Gomes de Sá, recommended by @s0undt3ch One of t

1 Jan 08, 2022