A high-performance distributed deep learning system targeting large-scale and automated distributed training.

Overview

HETU

Documentation | Examples

Hetu is a high-performance distributed deep learning system targeting trillions of parameters DL model training, developed by DAIR Lab at Peking University. It takes account of both high availability in industry and innovation in academia, which has a number of advanced characteristics:

  • Applicability. DL model definition with standard dataflow graph; many basic CPU and GPU operators; efficient implementation of more than plenty of DL models and at least popular 10 ML algorithms.

  • Efficiency. Achieve at least 30% speedup compared to TensorFlow on DNN, CNN, RNN benchmarks.

  • Flexibility. Supporting various parallel training protocols and distributed communication architectures, such as Data/Model/Pipeline parallel; Parameter server & AllReduce.

  • Scalability. Deployment on more than 100 computation nodes; Training giant models with trillions of model parameters, e.g., Criteo Kaggle, Open Graph Benchmark

  • Agility. Automatically ML pipeline: feature engineering, model selection, hyperparameter search.

We welcome everyone interested in machine learning or graph computing to contribute codes, create issues or pull requests. Please refer to Contribution Guide for more details.

Installation

  1. Clone the repository.

  2. Prepare the environment. We use Anaconda to manage packages. The following command create the conda environment to be used:conda env create -f environment.yml. Please prepare Cuda toolkit and CuDNN in advance.

  3. We use CMake to compile Hetu. Please copy the example configuration for compilation by cp cmake/config.example.cmake cmake/config.cmake. Users can modify the configuration file to enable/disable the compilation of each module. For advanced users (who not using the provided conda environment), the prerequisites for different modules in Hetu is listed in appendix.

# modify paths and configurations in cmake/config.cmake

# generate Makefile
mkdir build && cd build && cmake ..

# compile
# make all
make -j 8
# make hetu, version is specified in cmake/config.cmake
make hetu -j 8
# make allreduce module
make allreduce -j 8
# make ps module
make ps -j 8
# make geometric module
make geometric -j 8
# make hetu-cache module
make hetu_cache -j 8
  1. Prepare environment for running. Edit the hetu.exp file and set the environment path for python and the path for executable mpirun if necessary (for advanced users not using the provided conda environment). Then execute the command source hetu.exp .

Usage

Train logistic regression on gpu:

bash examples/cnn/scripts/hetu_1gpu.sh logreg MNIST

Train a 3-layer mlp on gpu:

bash examples/cnn/scripts/hetu_1gpu.sh mlp CIFAR10

Train a 3-layer cnn with gpu:

bash examples/cnn/scripts/hetu_1gpu.sh cnn_3_layers MNIST

Train a 3-layer mlp with allreduce on 8 gpus (use mpirun):

bash examples/cnn/scripts/hetu_8gpu.sh mlp CIFAR10

Train a 3-layer mlp with PS on 1 server and 2 workers:

# in the script we launch the scheduler and server, and two workers
bash examples/cnn/scripts/hetu_2gpu_ps.sh mlp CIFAR10

More Examples

Please refer to examples directory, which contains CNN, NLP, CTR, GNN training scripts. For distributed training, please refer to CTR and GNN tasks.

Community

License

The entire codebase is under license

Papers

  1. Xupeng Miao, Lingxiao Ma, Zhi Yang, Yingxia Shao, Bin Cui, Lele Yu, Jiawei Jiang. CuWide: Towards Efficient Flow-based Training for Sparse Wide Models on GPUs. TKDE 2021, ICDE 2021
  2. Xupeng Miao, Xiaonan Nie, Yingxia Shao, Zhi Yang, Jiawei Jiang, Lingxiao Ma, Bin Cui. Heterogeneity-Aware Distributed Machine Learning Training via Partial Reduce. SIGMOD 2021
  3. Xupeng Miao, Hailin Zhang, Yining Shi, Xiaonan Nie, Zhi Yang, Yangyu Tao, Bin Cui. HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework. VLDB 2022, ChinaSys 2021 Winter.
  4. coming soon

Cite

If you use Hetu in a scientific publication, we would appreciate citations to the following paper:

 @inproceedings{vldb/het22,
   title = {HET: Scaling out Huge Embedding Model Training via Cache-enabled Distributed Framework},
   author = {Xupeng Miao and
         Hailin Zhang and
         Yining Shi and
             Xiaonan Nie and
             Zhi Yang and
             Yangyu Tao and
             Bin Cui},
   journal = {Proc. {VLDB} Endow.},
   year = {2022},
   url  = {https://doi.org/10.14778/3489496.3489511},
   doi  = {10.14778/3489496.3489511},
 }

Acknowledgements

We learned and borrowed insights from a few open source projects including TinyFlow, autodist, tf.distribute and Angel.

Appendix

The prerequisites for different modules in Hetu is listed as follows:

"*" means you should prepare by yourself, while others support auto-download

Hetu: OpenMP(*), CMake(*)
Hetu (version mkl): MKL 1.6.1
Hetu (version gpu): CUDA 10.1(*), CUDNN 7.5(*)
Hetu (version all): both

Hetu-AllReduce: MPI 3.1, NCCL 2.8(*), this module needs GPU version

Hetu-PS: Protobuf(*), ZeroMQ 4.3.2

Hetu-Geometric: Pybind11(*), Metis(*)

Hetu-Cache: Pybind11(*), this module needs PS module

##################################################################
Tips for preparing the prerequisites

Preparing CUDA, CUDNN, NCCL(NCCl is already in conda environment):
1. download from https://developer.nvidia.com
2. install
3. modify paths in cmake/config.cmake if necessary

Preparing OpenMP:
Your just need to ensure your compiler support openmp.

Preparing CMake, Protobuf, Pybind11, Metis:
Install by anaconda: 
conda install cmake=3.18 libprotobuf pybind11=2.6.0 metis

Preparing OpenMPI (not necessary):
install by anaconda: `conda install -c conda-forge openmpi=4.0.3`
or
1. download from https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.3.tar.gz
2. build openmpi by `./configure /path/to/build && make -j8 && make install`
3. modify MPI_HOME to /path/to/build in cmake/config.cmake

Preparing MKL (not necessary):
install by anaconda: `conda install -c conda-forge onednn`
or
1. download from https://github.com/intel/mkl-dnn/archive/v1.6.1.tar.gz
2. build mkl by `mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8` 
3. modify MKL_ROOT to /path/to/root and MKL_BUILD to /path/to/build in cmake/config.cmake 

Preparing ZeroMQ (not necessary):
install by anaconda: `conda install -c anaconda zeromq=4.3.2`
or
1. download from https://github.com/zeromq/libzmq/releases/download/v4.3.2/zeromq-4.3.2.zip
2. build zeromq by 'mkdir /path/to/build && cd /path/to/build && cmake /path/to/root && make -j8`
3. modify ZMQ_ROOT to /path/to/build in cmake/config.cmake
Owner
DAIR Lab
Data and Intelligence Research (DAIR) Lab @ Peking University
DAIR Lab
InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing

InterFaceGAN - Interpreting the Latent Space of GANs for Semantic Face Editing Figure: High-quality facial attributes editing results with InterFaceGA

GenForce: May Generative Force Be with You 1.3k Jan 09, 2023
Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

Couler Project 781 Jan 03, 2023
For IBM Quantum Challenge Africa 2021, 9 September (07:00 UTC) - 20 September (23:00 UTC).

IBM Quantum Challenge Africa 2021 To ensure Africa is able to apply quantum computing to solve problems relevant to the continent, the IBM Research La

Qiskit Community 48 Dec 25, 2022
A module that used for encrypt code which includes RSA and AES

软件加密模块 requirement: Crypto,pycryptodome,pyqt5 本地加密信息为随机字符串 使用说明 命令行参数 -h 帮助 -checkWorking 检查是否能正常工作,后接1确认指令 -checkEndDate 检查截至日期,后接1确认指令 -activateCode

2 Sep 27, 2022
This implements one of result networks from Large-scale evolution of image classifiers

Exotic structured image classifier This implements one of result networks from Large-scale evolution of image classifiers by Esteban Real, et. al. Req

54 Nov 25, 2022
Official Pytorch implementation of ICLR 2018 paper Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge.

Deep Learning for Physical Processes: Integrating Prior Scientific Knowledge: Official Pytorch implementation of ICLR 2018 paper Deep Learning for Phy

emmanuel 47 Nov 06, 2022
Cleaned up code for DSTC 10: SIMMC 2.0 track: subtask 2: multimodal coreference resolution

UNITER-Based Situated Coreference Resolution with Rich Multimodal Input: arXiv MMCoref_cleaned Code for the MMCoref task of the SIMMC 2.0 dataset. Pre

Yichen (William) Huang 2 Dec 05, 2022
Implementation for our AAAI2021 paper (Entity Structure Within and Throughout: Modeling Mention Dependencies for Document-Level Relation Extraction).

SSAN Introduction This is the pytorch implementation of the SSAN model (see our AAAI2021 paper: Entity Structure Within and Throughout: Modeling Menti

benfeng 69 Nov 15, 2022
A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing.

AnimeGAN A simple PyTorch Implementation of Generative Adversarial Networks, focusing on anime face drawing. Randomly Generated Images The images are

Jie Lei 雷杰 1.2k Jan 03, 2023
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.

FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation [Project] [Paper] [arXiv] [Home] Official implementation of FastFCN:

Wu Huikai 815 Dec 29, 2022
Official Chainer implementation of GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral)

GP-GAN: Towards Realistic High-Resolution Image Blending (ACMMM 2019, oral) [Project] [Paper] [Demo] [Related Work: A2RL (for Auto Image Cropping)] [C

Wu Huikai 402 Dec 27, 2022
Random Erasing Data Augmentation. Experiments on CIFAR10, CIFAR100 and Fashion-MNIST

Random Erasing Data Augmentation =============================================================== black white random This code has the source code for

Zhun Zhong 654 Dec 26, 2022
This is a Image aid classification software based on python TK library development

This is a Image aid classification software based on python TK library development.

EasonChan 1 Jan 17, 2022
TensorFlow2 Classification Model Zoo playing with TensorFlow2 on the CIFAR-10 dataset.

Training CIFAR-10 with TensorFlow2(TF2) TensorFlow2 Classification Model Zoo. I'm playing with TensorFlow2 on the CIFAR-10 dataset. Architectures LeNe

Chia-Hung Yuan 16 Sep 27, 2022
Source code and data in paper "MDFEND: Multi-domain Fake News Detection (CIKM'21)"

MDFEND: Multi-domain Fake News Detection This is an official implementation for MDFEND: Multi-domain Fake News Detection which has been accepted by CI

Rich 40 Dec 18, 2022
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning

VisualGPT Our Paper VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning Main Architecture of Our VisualGPT Downloa

Vision CAIR Research Group, KAUST 140 Dec 28, 2022
PyTorch implementation for View-Guided Point Cloud Completion

PyTorch implementation for View-Guided Point Cloud Completion

22 Jan 04, 2023
Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model

Official Implementation of SWAGAN: A Style-based Wavelet-driven Generative Model SWAGAN: A Style-based Wavelet-driven Generative Model Rinon Gal, Dana

55 Dec 06, 2022
public repo for ESTER dataset and modeling (EMNLP'21)

Project / Paper Introduction This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350 Here, we provide brief descriptions of

PlusLab 19 Oct 27, 2022
DISTIL: Deep dIverSified inTeractIve Learning.

DISTIL: Deep dIverSified inTeractIve Learning. An active/inter-active learning library built on py-torch for reducing labeling costs.

decile-team 110 Dec 06, 2022