OSLO: Open Source framework for Large-scale transformer Optimization

Last update: Nov 24, 2022

Related tags

Deep Learning oslo

Overview

O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed 
from oslo import GPTJForCausalLM

# 1. 3D Parallelism
model = GPTJForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2,
)

# 2. Kernel Fusion
model = model.fuse()

# 3. DeepSpeed Support
engines = deepspeed.initialize(
    model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ...,
)

# 4. Data Processing
from oslo import (
    DatasetPreprocessor, 
    DatasetBlender, 
    DatasetForCausalLM, 
    ...    
)

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.
Kernel Fusion: A GPU optimization method to increase training and inference speed.
DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.
Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo,
  author       = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong},
  title        = {OSLO: Open Source framework for Large-scale transformer Optimization},
  howpublished = {\url{https://github.com/tunib-ai/oslo}},
  year         = {2021},
}

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).

Comments

[WIP] Implement ZeRO Stage 3 (FSDP)
Title

Implement ZeRO Stage 3 (FullyShardedDataParallel)

Description

[x] Add reduce_scatter_bucketer.py

[x] Add test_reduce_scatter_bucketer.py

[x] Add flatten_params_wrapper.py

[x] Add test_flatten_params_wrapper.py

[x] Add containers.py

[x] Add test_containers.py

[x] Add parallel.py

[x] Add test_parallel.py

[x] Add fsdp_optim_utils.py

[x] Update fsdp.py

[x] Add auto_wrap.py

[x] Add test_wrap.py
opened by jinok2im 9
FusedAdam & CPUAdam
Title

-FusedAdam & CPUAdam

Description

Implement FusedAdam & CPUAdam

Tasks

[x] Implement FusedAdam

[x] implement CPUAdam

[x] Test FusedAdam

[x] Test CPUAdam

[x] Test FusedSclaeMaskSoftmax (Name changed)
opened by cozytk 6
[WIP] Add data processing modules referring to the lassl
Title

add data processing modules referring to the lassl

Description

brought data processing functions that fit gpt2 with reference to lassl

Linked Issues

None
opened by gimmaru 6
Implementation of Sequential Parallelism
SP with DP implementation

Implemented SP wrapper with DP

Description

SequenceDataParallel works like native torch DDP with SP

you can find details in the file oslo/tests/torch/nn/parallal/data_parallel/test_sp.py
opened by ohwi 5
Update data collators and Add models
Title

Update data collators and Add models

Description

Updated data collators to utilize sequence parallel in Oslo trainer

Add models by referring to the transformers library
opened by gimmaru 3
Implement Expert Parallel and Test for Initialization and Forward Pass
Title

Implement Expert Parallel and Test for Initialization and Forward Pass

Description

Implement Wrapper, Modules and Features for Expert Parallel

Implement mapping_utils._ParallelMappingForHuggingFace as super class of _TensorParallelMappingForHuggingFace and _ExpertParallelMappingForHuggingFace

Test initialization and forward pass for expert parallel
opened by scsc0511 3
Integrate Sequence Parallelism branches
Title

Sequence parallelism (feat. @reniew, @ohwi, @l-yohai)

Description

This PR is Integration of SP current version. But there is something wrong.

We will fix the bugs for the coming week and write test modules according to the SP design.

It did not include the contents of the branch that worked for the test.
opened by l-yohai 3
implement tp-3d layers, wrapper, test codes and refactor all tp test codes and layers
implement tp-3d wrapper

rank transpose problem (tensor_3d_input_rank <-> tensor_3d_output_rank) by implementing ranking transpose function.

revise tp-3d layers for huggingface compatibility

implement tp-3d test codes

refactor all tp test codes

unify format across all tensor parallel modules.
opened by bzantium 2
Refactoring MultiheadAttention with todo anchors
Title

Refactoring MultiheadAttention with todo anchors

Description

Refactoring oslo/torch/nn/modules/functional/multi_head_attention_forward.py.

Remove unnecessary or unintended code and clean up annotations.

Unify return format and the variable name with native torch.

Additionally, I need to test attention_mask. However, it seems that it can proceed with this part after FusedScaleMaskSoftmax is integrated.

cc. @hyunwoongko @ohwi
opened by l-yohai 2
Add tp-1d layers testing
Add testing for tp-1d layers: col_linear, row_linear, vocab_embedding_1d

modify number to integer variable like summa_dim, world_size cc: @hyunwoongko
opened by bzantium 2
[WIP] add test code of sp training
Title

SP Model Test Code

Description

Writing a test code to verify that the gradient and loss values of the model are the same when the sequence parallelism is applied.

WIP - merging @ohwi 's test code comparing SP of ColossalAI and simple learning model.
opened by l-yohai 2

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)
Revert oslo to 1.1.2.

Source code(tar.gz)
Source code(zip)
v2.0.1(Feb 20, 2022)
Merge changes from functorch upstream.

Fix documents and tutorials

Source code(tar.gz)
Source code(zip)
v2.0.0(Feb 14, 2022)
Official release of OSLO 2.0.0 🎉🎉

This version of OSLO provides the following features:

Tensor model parallelism

Efficient activation checkpointing

Kernel fusion

We plan to add the pipeline model parallelism and the ZeRO optimization in the next versions.

New feature: Kernel Fusion

{ "kernel_fusion": { "enable": "bool", "memory_efficient_fusion": "bool", "custom_cuda_kernels": "list" } }

For more information, please check the kernel fusion tutorial
Source code(tar.gz)
Source code(zip)
v2.0.0a2(Feb 2, 2022)

Quick fix of cuda rng state tracker
Source code(tar.gz)
Source code(zip)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

You can use efficient activation checkpointing using OSLO with the following configuration.

model = oslo.initialize(
    model,
    config={
        "model_parallelism": {
            "enable": True,
            "tensor_parallel_size": YOUR_TENSOR_PARALLEL_SIZE,
        },
        "activation_checkpointing": {
            "enable": True,
            "cpu_checkpointing": True,
            "partitioned_checkpointing": True,
            "contiguous_checkpointing": True,
        },
    },
)

Tutorial: https://tunib-ai.github.io/oslo/TUTORIALS/activation_checkpointing.html

Source code(tar.gz)
Source code(zip)

v2.0.0a0(Jan 30, 2022)
New API

We paid homage to DeepSpeed. Now it's easier and simpler to use.

import oslo model = oslo.initialize(model, config="oslo-config.json")

Add new models

Albert

Bert

Bart

T5

GPT2

GPTNeo

GPTJ

Electra

Roberta

Add document

https://tunib-ai.github.io/oslo

Remove old pipeline parallelism, kernel fusion code

We'll refurbish them using the latest methods

Kernel fusion: AOTAutograd

Pipeline parallelism: Sagemaker PP

Source code(tar.gz)
Source code(zip)
v.1.1.2(Jan 15, 2022)
Updates

[#7] Selective Kernel Fusion [#9] Fix argument bug

New Feature: Selective Kernel Fusion

Since version 1.1.2, you can fuse only partial kernels, not all kernels. Currently, only Attention class and MLP class are supported.

from oslo import GPT2MLP, GPT2Attention # MLP only fusion model.fuse([GPT2MLP]) # Attention only fusion model.fuse([GPT2Attention]) # MLP + Attention fusion model.fuse([GPT2MLP, GPT2Attention])
Source code(tar.gz)
Source code(zip)

v1.1(Dec 29, 2021)

[#3] Add deployment launcher of Parallelformers into OSLO.

from oslo import GPTNeoForCausalLM

model = GPTNeoForCausalLM.from_pretrained_with_parallel(
    "EleutherAI/gpt-neo-2.7B",
    tensor_parallel_size=2,
    pipeline_parallel_size=2,
    deployment=True  # <-- new feature !
)

You can easily use deployment launcher by deployment=True. Please refer to USAGE.md for more details.

Source code(tar.gz)
Source code(zip)

v1.0.1(Dec 22, 2021)
Quick Fix

Support Megatron-LM style (.jsonl) file preprecessing.

Source code(tar.gz)
Source code(zip)
v1.0(Dec 21, 2021)
O S L O

Open Source framework for Large-scale transformer Optimization

What's New:

December 21, 2021 Released OSLO 1.0.

What is OSLO about?

OSLO is a framework that provides various GPU based optimization features for large-scale modeling. As of 2021, the Hugging Face Transformers is being considered de facto standard. However, it does not best fit the purposes of large-scale modeling yet. This is where OSLO comes in. OSLO is designed to make it easier to train large models with the Transformers. For example, you can fine-tune GPTJ on the Hugging Face Model Hub without many extra efforts using OSLO. Currently, GPT2, GPTNeo, and GPTJ are supported, but we plan to support more soon.

Installation

OSLO can be easily installed using the pip package manager. All the dependencies such as torch, transformers, dacite, ninja and pybind11 should be installed automatically with the following command. Be careful that the 'core' in the PyPI project name.

pip install oslo-core

Some of features rely on the C++ language. So we provide an option, CPP_AVAILABLE, to decide whether or not you install them.

If the C++ is available:

CPP_AVAILABLE=1 pip install oslo-core

If the C++ is not available:

CPP_AVAILABLE=0 pip install oslo-core

Note that the default value of CPP_AVAILABLE is 0 in Windows and 1 in Linux.

Key Features

import deepspeed from oslo import GPTJForCausalLM # 1. 3D Parallelism model = GPTJForCausalLM.from_pretrained_with_parallel( "EleutherAI/gpt-j-6B", tensor_parallel_size=2, pipeline_parallel_size=2, ) # 2. Kernel Fusion model = model.fuse() # 3. DeepSpeed Support engines = deepspeed.initialize( model=model.gpu_modules(), model_parameters=model.gpu_paramters(), ..., ) # 4. Data Processing from oslo import ( DatasetPreprocessor, DatasetBlender, DatasetForCausalLM, ... )

OSLO offers the following features.

3D Parallelism: The state-of-the-art technique for training a large-scale model with multiple GPUs.

Kernel Fusion: A GPU optimization method to increase training and inference speed.

DeepSpeed Support: We support DeepSpeed which provides ZeRO data parallelism.

Data Processing: Various utilities for efficient large-scale data processing.

See USAGE.md to learn how to use them.

Administrative Notes

Citing OSLO

If you find our work useful, please consider citing:

@misc{oslo, author = {Ko, Hyunwoong and Kim, Soohwan and Park, Kyubyong}, title = {OSLO: Open Source framework for Large-scale transformer Optimization}, howpublished = {\url{https://github.com/tunib-ai/oslo}}, year = {2021}, }

Licensing

The Code of the OSLO project is licensed under the terms of the Apache License 2.0.

Copyright 2021 TUNiB Inc. http://www.tunib.ai All Rights Reserved.

Acknowledgements

The OSLO project is built with GPU support from the AICA (Artificial Intelligence Industry Cluster Agency).
Source code(tar.gz)
Source code(zip)

Owner

TUNiB

TUNiB Inc.

GitHub Repository

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

STCN Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation Ho Kei Cheng, Yu-Wing Tai, Chi-Keung Tang [a

456 Dec 12, 2022

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

Volumetric TSDF Fusion of RGB-D Images in Python This is a lightweight python script that fuses multiple registered color and depth images into a proj

845 Jan 03, 2023

ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

VOODOO Revealing supercooled liquid beyond lidar attenuation Explore the docs » Report Bug · Request Feature Table of Contents About The Project Built

2 Apr 28, 2022

Toontown House CT Edition

Toontown House: Classic Toontown House Classic source that should just work. ❓ W

5 Jan 09, 2022

Config files for my GitHub profile.

Canalyst Candas Data Science Library Name Canalyst Candas Description Built by a former PM / analyst to give anyone with a little bit of Python knowle

13 Jun 24, 2022

A synthetic texture-invariant dataset for object detection of UAVs

A synthetic dataset for object detection of UAVs This repository contains a synthetic datasets accompanying the paper Sim2Air - Synthetic aerial datas

10 Aug 13, 2022

Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.

173 Dec 25, 2022

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Re-TACRED Re-TACRED: Addressing Shortcomings of the TACRED Dataset

40 Dec 10, 2022

Implementation of the state-of-the-art vision transformers with tensorflow

ViT Tensorflow This repository contains the tensorflow implementation of the state-of-the-art vision transformers (a category of computer vision model

2 Mar 16, 2022

gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

gACSON gACSON software is to visualize, segment, and analyze the morphology of neurons in three-dimensional electron microscopy images. If you use any

2 May 31, 2022

SPTAG: A library for fast approximate nearest neighbor search

SPTAG: A library for fast approximate nearest neighbor search SPTAG SPTAG (Space Partition Tree And Graph) is a library for large scale vector approxi

4.3k Jan 01, 2023

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

使用说明适配 windows7以上 64位原神1920x1080窗口(其他分辨率后续适配) 待更新渊下宫 English version is to be

209 Dec 28, 2022

Code for one-stage adaptive set-based HOI detector AS-Net.

AS-Net Code for one-stage adaptive set-based HOI detector AS-Net. Mingfei Chen*, Yue Liao*, Si Liu, Zhiyuan Chen, Fei Wang, Chen Qian. "Reformulating

45 Dec 09, 2022

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

labml.ai Deep Learning Paper Implementations This is a collection of simple PyTorch implementations of neural networks and related algorithms. These i

16.4k Jan 09, 2023

Adversarial Autoencoders

Adversarial Autoencoders (with Pytorch) Dependencies argparse time torch torchvision numpy itertools matplotlib Create Datasets python create_datasets

188 Jan 01, 2023

Employee-Managment - Company employee registration software in the face recognition system

Employee-Managment Company employee registration software in the face recognitio

7 Jul 10, 2022

Class-Attentive Diffusion Network for Semi-Supervised Classification [AAAI'21] (official implementation)

Class-Attentive Diffusion Network for Semi-Supervised Classification Official Implementation of AAAI 2021 paper Class-Attentive Diffusion Network for

7 Sep 20, 2022

Deep learning toolbox based on PyTorch for hyperspectral data classification.

304 Dec 28, 2022

This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

109 Dec 29, 2022

OBBDetection: an oriented object detection toolbox modified from MMdetection

OBBDetection note: If you have questions or good suggestions, feel free to propose issues and contact me. introduction OBBDetection is an oriented obj

3 Nov 11, 2022

OSLO: Open Source framework for Large-scale transformer Optimization

Related tags

Overview

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Comments

Title

Description

Title

Description

Tasks

Title

Description

Linked Issues

SP with DP implementation

Description

Title

Description

Title

Description

Title

Description

Title

Description

Title

Description

Releases(v2.0.2)

v2.0.2(Aug 25, 2022)

v2.0.1(Feb 20, 2022)

v2.0.0(Feb 14, 2022)

Official release of OSLO 2.0.0 🎉🎉

New feature: Kernel Fusion

v2.0.0a2(Feb 2, 2022)

v2.0.0a1(Feb 2, 2022)

Add activation checkpointing

v2.0.0a0(Jan 30, 2022)

New API

Add new models

Add document

Remove old pipeline parallelism, kernel fusion code

v.1.1.2(Jan 15, 2022)

Updates

New Feature: Selective Kernel Fusion

v1.1(Dec 29, 2021)

v1.0.1(Dec 22, 2021)

v1.0(Dec 21, 2021)

O S L O

What's New:

What is OSLO about?

Installation

Key Features

Administrative Notes

Citing OSLO

Licensing

Acknowledgements

Owner

TUNiB

Rethinking Space-Time Networks with Improved Memory Coverage for Efficient Video Object Segmentation

Python code to fuse multiple RGB-D images into a TSDF voxel volume.

ANN model for prediction a spatio-temporal distribution of supercooled liquid in mixed-phase clouds using Doppler cloud radar spectra.

Toontown House CT Edition

Config files for my GitHub profile.

A synthetic texture-invariant dataset for object detection of UAVs

Jupyter Dock is a set of Jupyter Notebooks for performing molecular docking protocols interactively, as well as visualizing, converting file formats and analyzing the results.

Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Implementation of the state-of-the-art vision transformers with tensorflow

gACSON software for visualization, processing and analysis of three-dimensional electron microscopy images

SPTAG: A library for fast approximate nearest neighbor search

GenshinMapAutoMarkTools - Tools To add/delete/refresh resources mark in Genshin Impact Map

Code for one-stage adaptive set-based HOI detector AS-Net.

This is a collection of simple PyTorch implementations of neural networks and related algorithms. These implementations are documented with explanations,

Adversarial Autoencoders

Employee-Managment - Company employee registration software in the face recognition system