Source code for "Efficient Training of BERT by Progressively Stacking"

Overview

Introduction

This repository is the code to reproduce the result of Efficient Training of BERT by Progressively Stacking. The code is based on Fairseq.

Requirements and Installation

  • PyTorch >= 1.0.0
  • For training new models, you'll also need an NVIDIA GPU and NCCL
  • Python version 3.7

After PyTorch is installed, you can install requirements with:

pip install -r requirements.txt

Getting Started

Step 1:

bash install.sh

This script downloads:

  1. Moses Decoder
  2. Subword NMT
  3. Fast BPE (In the next steps, we use Subword NMT instead of Fast BPE. Recommended if you want to generate your own dictionary on a large-scale dataset.)

These library will do cleaning, tokenization, and BPE encoding for GLUE data in step 3. They will also be helpful if you want to make your own corpus for BERT training or if you want to test our model on your own tasks.

Step 2:

bash reproduce_bert.sh

This script runs progressive stacking and train a BERT. The code is tested on 4 Tesla P40 GPUs (24GB Gmem). For different hardware, you probably need to change the maximum number of tokens per batch (by changing max-tokens and update-freq).

Step 3:

bash reproduce_glue.sh

This script fine-tunes the BERT trained in step 2. The script chooses the checkpoint trained for 400K steps, which is the same as the stacking model in our paper.

Cite

@InProceedings{pmlr-v97-gong19a,
  title = 	 {Efficient Training of {BERT} by Progressively Stacking},
  author = 	 {Gong, Linyuan and He, Di and Li, Zhuohan and Qin, Tao and Wang, Liwei and Liu, Tieyan},
  booktitle = 	 {Proceedings of the 36th International Conference on Machine Learning},
  pages = 	 {2337--2346},
  year = 	 {2019},
  editor = 	 {Chaudhuri, Kamalika and Salakhutdinov, Ruslan},
  volume = 	 {97},
  series = 	 {Proceedings of Machine Learning Research},
  address = 	 {Long Beach, California, USA},
  month = 	 {09--15 Jun},
  publisher = 	 {PMLR},
  pdf = 	 {http://proceedings.mlr.press/v97/gong19a/gong19a.pdf},
  url = 	 {http://proceedings.mlr.press/v97/gong19a.html},
}
Owner
Gong Linyuan
Gong Linyuan
Simple Telegram webscrap bot

webscrap-bot Simple Telegram webscrap bot Configs TOKEN - Get bot token from @BotFather API_ID - From my.telegram.org API_HASH - From my.telegram.org

lokaman chendekar 10 Oct 21, 2022
The Dolby.io Developer Days Getting Started with Media APIs Workshop repo.

Dolby.io Developer Days Media APIs Getting Started Application About this Workshop and Application This example is designed to get participants workin

Dolby.io Samples 2 Nov 03, 2022
WikiChecker - Repositorio oficial del complemento WikiChecker para NVDA.

WikiChecker Buscador rápido de artículos en Wikipedia. Introducción. El complemento WikiChecker para NVDA permite a los usuarios consultar de forma rá

2 Jan 10, 2022
A discord bot that utilizes Google's Rest API for Calendar, Drive, and Sheets

Bott This is a discord bot that utilizes Google's Rest API for Calendar, Drive, and Sheets. The bot first takes the sheet from the schedule manager in

1 Dec 04, 2021
CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

CloudFormation Drift Remediation - Use Cloud Control API to remediate drift that was detected on a CloudFormation stack

Cloudar 36 Dec 11, 2022
asyncio client for Deta Cloud

aiodeta Unofficial client for Deta Clound Install pip install aiodeta Supported functionality Deta Base Deta Drive Decorator for cron tasks Examples i

Andrii Leitsius 19 Feb 14, 2022
Find the best repos to contribute to, right from Discord!

repo-finder-bot Find the best repos to contribute to, right from Discord! Add to your server FAQs Hmm. What's this? This is the Repo Finder Bot, a bot

Skyascii 61 Dec 25, 2022
The Bot provide Hadith API and fetch content via api.hadith.sutanlab.id

Bot Hadith-API on Telegram The Bot provide Hadith API and fetch content via api.hadith.sutanlab.id Built With Python Asynchronous HTTP protocol client

xMan 12 Feb 19, 2022
Discord Account Generator that will create Account with hCaptcha bypass. Using socks4 proxies

Account-Generator [!] This was made for education. Please use socks4 proxies for nice experiences. [!] Please install these modules - "pip3 install ht

RyanzSantos 10 Feb 23, 2022
A simple and modular Discord bot with various functionalities.

All-In-Bot for Discord A simple and modular Discord bot with various functionalities. How to use the bot? Simple! Just invite the bot to your server u

Th3J0nny 3 Jan 29, 2022
Generate discord nitro codes and check them

Discord Nitro Generator and Checker A discord nitro generator and checker for all your nitro needs Explore the docs » Report Bug · Request Feature · J

509 Jan 02, 2023
Joins a specified server on all the tokens

Joins a specified server on all the tokens. Usage python -m pip install requests python joiner.py Note Your tokens must be located in a text file call

1 Dec 21, 2021
A Telegram Bot That Provides Permanent Download Links For Sent Files.

FileStreamBot A Telegram bot to all media and documents files to web link . Report a Bug | Request Feature Demo Bot: 🍁 About This Bot : This bot will

Flux Inc. 1 Nov 02, 2021
A Telegram Bot to return Youtube Video Tags Using YoutubeTags API

YouTube-TagFind-Bot A Telegram Bot to return Youtube Video Tags Using YoutubeTags API YoutubeTags API Wrapper YoutubeTags is a python third-party api

Nuhman Pk 9 Aug 25, 2022
This is a translator that i made by myself in python with the 'googletrans' library

Translator-Python This is a translator that i made by myself in python with the 'googletrans' library This application completely made in python allow

Thadeuks 2 Jun 17, 2022
基于nonebot2开发的群管机器人qbot,支持上传并运行python代码以及一些基础管理功能

nonebot2-Eleina 基于nonebot2开发的群管机器人qbot,支持上传并运行python代码以及一些基础管理功能 Readme 环境:python3.7.3+,go-cqhttp 安装及配置:参见(https://v2.nonebot.dev/guide/installation.h

1 Dec 06, 2022
Financial portfolio optimisation in python, including classical efficient frontier, Black-Litterman, Hierarchical Risk Parity

PyPortfolioOpt has recently been published in the Journal of Open Source Software 🎉 PyPortfolioOpt is a library that implements portfolio optimizatio

Robert Martin 3.2k Jan 02, 2023
Unlimited Filter Telegram Bot 2

Mother NAther Bot Features Auto Filter Manuel Filter IMDB Admin Commands Broadcast Index IMDB search Inline Search Random pics ids and User info Stats

LɪᴏɴKᴇᴛᴛʏUᴅ 1 Oct 30, 2021
Métamorphose Renamer v2

Métamorphose 2 Métamorphose is a graphical mass renaming program for files and folders. These are the command line options: -h, --help Show hel

Métamorphose 129 Dec 30, 2022
Dicha herramienta esta creada con una api... esta api permite enviar un SMS cada 12 horas dependiendo del pais... Hay algunos paises y operadoras no están soportados.

SMSFree pkg install python3 pip install requests git clone https://github.com/Hidden-parker/SMSFree cd SMSFree python sms.py DISFRUTA... Dicha herrami

piter 2 Nov 14, 2021