ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Last update: Dec 14, 2022

Related tags

Text Data & NLP ElasticBERT

Overview

ElasticBERT

This repository contains finetuning code and checkpoints for ElasticBERT.

Towards Efficient NLP: A Standard Evaluation and A Strong Baseline

Xiangyang Liu, Tianxiang Sun, Junliang He, Lingling Wu, Xinyu Zhang, Hao Jiang, Zhao Cao, Xuanjing Huang, Xipeng Qiu

Requirements

We recommend using Anaconda for setting up the environment of experiments:

conda create -n elasticbert python=3.8.8
conda activate elasticbert
conda install pytorch==1.8.1 cudatoolkit=11.1 -c pytorch -c conda-forge
pip install -r requirements.txt

Pre-trained Models

We provide the pre-trained weights of ElasticBERT-BASE and ElasticBERT-LARGE, which can be directly used in Huggingface-Transformers.

ElasticBERT-BASE: 12 layers, 12 Heads and 768 Hidden Size.
ElasticBERT-LARGE: 24 layers, 16 Heads and 1024 Hidden Size.

The pre-trained weights can be downloaded here.

Model	`MODEL_NAME`
`ElasticBERT-BASE`	fnlp/elasticbert-base
`ElasticBERT-LARGE`	fnlp/elasticbert-large

Downstream task datasets

The GLUE task datasets can be downloaded from the GLUE leaderboard

The ELUE task datasets can be downloaded from the ELUE leaderboard

Finetuning in static usage

We provide the finetuning code for both GLUE tasks and ELUE tasks in static usage on ElasticBERT.

For GLUE:

cd finetune-static
bash finetune_glue.sh

For ELUE:

cd finetune-static
bash finetune_elue.sh

Finetuning in dynamic usage

We provide finetuning code to apply two kind of early exiting methods on ElasticBERT.

For early exit using entropy criterion:

cd finetune-dynamic
bash finetune_elue_entropy.sh

For early exit using patience criterion:

cd finetune-dynamic
bash finetune_elue_patience.sh

Please see our paper for more details!

Contact

If you have any problems, raise an issue or contact Xiangyang Liu

Citation

If you find this repo helpful, we'd appreciate it a lot if you can cite the corresponding paper:

@article{liu2021elasticbert,
  author    = {Xiangyang Liu and
               Tianxiang Sun and
               Junliang He and
               Lingling Wu and
               Xinyu Zhang and
               Hao Jiang and
               Zhao Cao and
               Xuanjing Huang and
               Xipeng Qiu},
  title     = {Towards Efficient {NLP:} {A} Standard Evaluation and {A} Strong Baseline},
  journal   = {CoRR},
  volume    = {abs/2110.07038},
  year      = {2021},
  url       = {https://arxiv.org/abs/2110.07038},
  eprinttype = {arXiv},
  eprint    = {2110.07038},
  timestamp = {Fri, 22 Oct 2021 13:33:09 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2110-07038.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

ElasticBERT: A pre-trained model with multi-exit transformer architecture.

Related tags

Overview

ElasticBERT

Requirements

Pre-trained Models

Downstream task datasets

Finetuning in static usage

Finetuning in dynamic usage

Contact

Citation

Owner

fastNLP

GPT-Code-Clippy (GPT-CC) is an open source version of GitHub Copilot, a language model

Chinese Named Entity Recognization (BiLSTM with PyTorch)

Application to help find best train itinerary, uses speech to text, has a spam filter to segregate invalid inputs, NLP and Pathfinding algos.

Mlcode - Continuous ML API Integrations

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

SpikeX - SpaCy Pipes for Knowledge Extraction

DomainWordsDict, Chinese words dict that contains more than 68 domains, which can be used as text classification、knowledge enhance task

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Conditional Transformer Language Model for Controllable Generation

An extension for asreview implements a version of the tf-idf feature extractor that saves the matrix and the vocabulary.

Sorce code and datasets for "K-BERT: Enabling Language Representation with Knowledge Graph",

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Asr abc - Automatic speech recognition(ASR),中文语音识别

This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

A framework for cleaning Chinese dialog data

OceanScript is an Esoteric language used to encode and decode text into a formulation of characters

Repo for Enhanced Seq2Seq Autoencoder via Contrastive Learning for Abstractive Text Summarization

BMInf (Big Model Inference) is a low-resource inference package for large-scale pretrained language models (PLMs).

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

code for modular summarization work published in ACL2021 by Krishna et al