GLM (General Language Model)

Related tags

Deep LearningGLM
Overview

GLM

GLM is a General Language Model pretrained with an autoregressive blank-filling objective and can be finetuned on various natural language understanding and generation tasks.

Please refer to our paper for a detailed description of GLM:

All NLP Tasks Are Generation Tasks: A General Pretraining Framework

Zhengxiao Du*, Yujie Qian*, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang (*: equal contribution)

Part of the code is based on Megatron-LM and PET.

Pretrained Models

You can download the pretrained models used in the paper here.

Name Params File Config
GLM-Base 110M glm-base-blank.tar.bz2 model_blocklm_base.sh
GLM-Large 335M glm-large-blank.tar.bz2 model_blocklm_large.sh
GLM-Large (multi-task) 335M glm-large-generation.tar.bz2 model_blocklm_large_generation.sh
GLM-410M (multi-task) 410M glm-1.25-generation.tar.bz2 model_blocklm_1.25_generation.sh
GLM-515M (multi-task) 515M glm-1.5-generation.tar.bz2 model_blocklm_1.5_generation.sh
GLM-RoBERTa 335M glm-roberta-large-blank.tar.bz2 model_blocklm_roberta_large.sh

Installation

Clone this repo

git clone https://github.com/THUDM/GLM
cd GLM

Please first install PyTorch (we use 1.7.0) and apex, and then install other dependencies by

pip install -r requirements.txt

Usage

We provide scripts for finetuning GLM on some downstream tasks.

SuperGLUE

  • Download the SuperGlue data and check the experiment setup in scripts/finetune_superglue.sh. Note that DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH need to be changed to your local path. You may also change the batch-size and nproc_per_node according to your available hardware. We suggest to use aggregated batch size 64 for MultiRC and ReCORD and 16 for other tasks.

  • Run the following script (use the COPA dataset as an example)

bash scripts/finetune_superglue.sh \
     config_tasks/model_blocklm_roberta_large.sh \
     config_tasks/task_copa.sh
  • To apply GLM to a new NLU dataset with cloze-filling finetuning, implement a DataProcessor in tasks/superglue/dataset.py for data loading and add a PVP in tasks/superglue/pvp.py for the cloze question. More details can be found here.

  • The cloze questions (prompts) used in this work are written by human. We are also studying a P-tuning (prompt tuning) approach to search for the optimal continuous prompt. Please refer to our paper and code.

Text Summarization

  • Download the Gigaword dataset and check the experiment setup in scripts/finetune_seq2seq.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.

  • Run the following script

bash scripts/finetune_seq2seq.sh \ 
     config_tasks/model_blocklm_large_generation.sh \ 
     config_tasks/seq_gigaword.sh
  • For calculating rouge, install file2rouge from here and run bash scripts/evaluate_seq2seq.sh

Language Modeling

LAMBADA Cloze Accuracy

bash scripts/evaluate_lm.sh \ 
     config_tasks/model_blocklm_large_generation.sh \
     config_tasks/zero_lambada.sh 

LM Perplexity

  • Download our test set of wikibook (or any dataset following the same format) and change DATA_ROOT, CHECKPOINT_PATH in scripts/evaluate_lm.sh
  • Run the following script
    bash scripts/evaluate_lm.sh \ 
       config_tasks/model_blocklm_large_generation.sh \
       config_tasks/zero_lm.sh 

Blank Language Model

  • Download the Yahoo dataset and check the experiment setup in scripts/finetune_blank.sh. Change DATA_ROOT, CHECKPOINT_PATH, SAVE_PATH to your local path.

  • Run the following script

bash scripts/finetune_blank.sh \ 
     config_tasks/model_blocklm_large.sh \ 
     config_tasks/seq_blank.sh

Blank Filling (Interactive)

  • Change CHECKPOINT_PATH to your local path. Run the following script
bash scripts/generate_block.sh \
     config_tasks/model_blocklm_large.sh

Example:

Context: Ng is an adjunct professor at [MASK] (formerly associate professor and Director of its Stanford AI Lab or SAIL ). Also a pioneer in online education, Ng co-founded Coursera and deeplearning.ai.

GLM: [CLS] ng is an adjunct professor at [MASK] ( formerly associate professor and director of its stanford ai lab or sail ) . also a pioneer in online education , ng co - founded coursera and deeplearning . ai . [PAD] <|startofpiece|> the stanford university

Citation

Please cite our paper if you find this code useful for your research:

@article{DBLP:journals/corr/abs-2103-10360,
  author    = {Zhengxiao Du and
               Yujie Qian and
               Xiao Liu and
               Ming Ding and
               Jiezhong Qiu and
               Zhilin Yang and
               Jie Tang},
  title     = {All {NLP} Tasks Are Generation Tasks: {A} General Pretraining Framework},
  journal   = {CoRR},
  volume    = {abs/2103.10360},
  year      = {2021},
  url       = {https://arxiv.org/abs/2103.10360}
}
Owner
THUDM
Data Mining Research Group at Tsinghua University
THUDM
The dynamics of representation learning in shallow, non-linear autoencoders

The dynamics of representation learning in shallow, non-linear autoencoders The package is written in python and uses the pytorch implementation to ML

Maria Refinetti 4 Jun 08, 2022
A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing"

A PyTorch implementation of "Pathfinder Discovery Networks for Neural Message Passing" (WebConf 2021). Abstract In this work we propose Pathfind

Benedek Rozemberczki 49 Dec 01, 2022
Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System

Inverse Optimal Control Adapted to the Noise Characteristics of the Human Sensorimotor System This repository contains code for the paper Schultheis,

2 Oct 28, 2022
The "breathing k-means" algorithm with datasets and example notebooks

The Breathing K-Means Algorithm (with examples) The Breathing K-Means is an approximation algorithm for the k-means problem that (on average) is bette

Bernd Fritzke 75 Nov 17, 2022
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
🕹️ Official Implementation of Conditional Motion In-betweening (CMIB) 🏃

Conditional Motion In-Betweening (CMIB) Official implementation of paper: Conditional Motion In-betweeening. Paper(arXiv) | Project Page | YouTube in-

Jihoon Kim 81 Dec 22, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and i

yifan liu 147 Dec 03, 2022
Code for Neurips2021 Paper "Topology-Imbalance Learning for Semi-Supervised Node Classification".

Topology-Imbalance Learning for Semi-Supervised Node Classification Introduction Code for NeurIPS 2021 paper "Topology-Imbalance Learning for Semi-Sup

Victor Chen 40 Nov 23, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022
[ICLR 2021] HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark Accepted as a spotlight paper at ICLR 2021. Table of content File structure Prerequi

72 Jan 03, 2023
Official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT This repository is the official implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification. ArXiv If

International Business Machines 168 Dec 29, 2022
A Bayesian cognition approach for belief updating of correlation judgement through uncertainty visualizations

Overview Code and supplemental materials for Karduni et al., 2020 IEEE Vis. "A Bayesian cognition approach for belief updating of correlation judgemen

Ryan Wesslen 1 Feb 08, 2022
SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs

SMORE: Knowledge Graph Completion and Multi-hop Reasoning in Massive Knowledge Graphs SMORE is a a versatile framework that scales multi-hop query emb

Google Research 135 Dec 27, 2022
WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

WeakVRD-Captioning - Implementation of paper Improving Image Captioning with Better Use of Caption

30 Oct 28, 2022
Image-to-Image Translation in PyTorch

CycleGAN and pix2pix in PyTorch New: Please check out contrastive-unpaired-translation (CUT), our new unpaired image-to-image translation model that e

Jun-Yan Zhu 19k Jan 07, 2023
Code for generating a single image pretraining dataset

Single Image Pretraining of Visual Representations As shown in the paper A critical analysis of self-supervision, or what we can learn from a single i

Yuki M. Asano 12 Dec 19, 2022
A real-time motion capture system that estimates poses and global translations using only 6 inertial measurement units

TransPose Code for our SIGGRAPH 2021 paper "TransPose: Real-time 3D Human Translation and Pose Estimation with Six Inertial Sensors". This repository

Xinyu Yi 261 Dec 31, 2022
Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings

Prompt-BERT: Prompt makes BERT Better at Sentence Embeddings Results on STS Tasks Model STS12 STS13 STS14 STS15 STS16 STSb SICK-R Avg. unsup-prompt-be

196 Jan 08, 2023
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022
All-in-one Docker container that allows a user to explore Nautobot in a lab environment.

Nautobot Lab This container is not for production use! Nautobot Lab is an all-in-one Docker container that allows a user to quickly get an instance of

Nautobot 29 Sep 16, 2022