The official github repository for Towards Continual Knowledge Learning of Language Models

Last update: Jan 07, 2023

Overview

Towards Continual Knowledge Learning of Language Models

This is the official github repository for Towards Continual Knowledge Learning of Language Models.

In order to reproduce our results, take the following steps:

1. Create conda environment and install requirements

conda create -n ckl python=3.8 && conda activate ckl
pip install -r requirements.txt

Also, make sure to install the correct version of pytorch corresponding to the CUDA version and environment: Refer to https://pytorch.org/

#For CUDA 10.x
pip3 install torch torchvision torchaudio
#For CUDA 11.x
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

2. Download the data used for the experiments.

To download only the CKL benchmark dataset:

python download_ckl_data.py

To download ALL of the data used for the experiments (required to reproduce results):

python download_all_data.py

To download the (continually pretrained) model checkpoints of the main experiment (required to reproduce results):

python download_model_checkpoints.py

For the other experimental settings such as multiple CKL phases, GPT-2, we do not separately provide the continually pretrained model checkpoints.

3. Reproducing Experimental Results

We provide all the configs in order to reproduce the zero-shot results of our paper. We only provide the model checkpoints for the main experimental setting (full_setting) which can be downloaded with the command above.

configs
├── full_setting
│   ├── evaluation
│   |   ├── invariantLAMA
│   |   |   ├── t5_baseline.json
│   |   |   ├── t5_kadapters.json
│   |   |   ├── ...
│   |   ├── newLAMA
│   |   ├── newLAMA_easy
│   |   ├── updatedLAMA
│   ├── training
│   |   ├── t5_baseline.json
│   |   ├── t5_kadapters.json
│   |   ├── ...
├── GPT2
│   ├── ...
├── kilt
│   ├── ...
├── small_setting
│   ├── ...
├── split
│   ├── ...

Components in each configurations file

input_length (int) : the input sequence length
output_length (int) : the output sequence length
num_train_epochs (int) : number of training epochs
output_dir (string) : the directory to save the model checkpoints
dataset (string) : the dataset to perform zero-shot evaluation or continual pretraining
dataset_version (string) : the version of the dataset ['full', 'small', 'debug']
train_batch_size (int) : batch size used for training
learning rate (float) : learning rate used for training
model (string) : model name in huggingface models (https://huggingface.co/models)
method (string) : method being used ['baseline', 'kadapter', 'lora', 'mixreview', 'modular_small', 'recadam']
freeze_level (int) : how much of the model to freeze during traininig (0 for none, 1 for freezing only encoder, 2 for freezing all of the parameters)
gradient_accumulation_steps (int) : gradient accumulation used to match the global training batch of each method
ngpu (int) : number of gpus used for the run
num_workers (int) : number of workers for the Dataloader
resume_from_checkpoint (string) : null by default. directory to model checkpoint if resuming from checkpoint
accelerator (string) : 'ddp' by default. the pytorch lightning accelerator to be used.
use_deepspeed (bool) : false by default. Currently not extensively tested.
CUDA_VISIBLE_DEVICES (string) : gpu devices that are made available for this run (e.g. "0,1,2,3", "0")
wandb_log (bool) : whether to log experiment through wandb
wandb_project (string) : project name of wandb
wandb_run_name (string) : the name of this training run
mode (string) : 'pretrain' for all configs
use_lr_scheduling (bool) : true if using learning rate scheduling
check_validation (bool) : true for evaluation (no training)
checkpoint_path (string) : path to the model checkpoint that is used for evaluation
output_log (string) : directory to log evaluation results to
split_num (int) : default is 1. more than 1 if there are multile CKL phases
split (int) : which CKL phase it is

This is an example of getting the invariantLAMA zero-shot evaluation of continually pretrained t5_kadapters

python run.py --config configs/full_setting/evaluation/invariantLAMA/t5_kadapters.json

This is an example of performing continual pretraining on CC-RecentNews (main experiment) with t5_kadapters

python run.py --config configs/full_setting/training/t5_kadapters.json

Reference

@article{jang2021towards,
  title={Towards Continual Knowledge Learning of Language Models},
  author={Jang, Joel and Ye, Seonghyeon and Yang, Sohee and Shin, Joongbo and Han, Janghoon and Kim, Gyeonghun and Choi, Stanley Jungkyu and Seo, Minjoon},
  journal={arXiv preprint arXiv:2110.03215},
  year={2021}
}

The official github repository for Towards Continual Knowledge Learning of Language Models

Related tags

Overview

Towards Continual Knowledge Learning of Language Models

1. Create conda environment and install requirements

2. Download the data used for the experiments.

3. Reproducing Experimental Results

Components in each configurations file

Reference

Owner

Joel Jang | 장요엘

A simple Python library for stochastic graphical ecological models

Mask-invariant Face Recognition through Template-level Knowledge Distillation

The Official Implementation of the ICCV-2021 Paper: Semantically Coherent Out-of-Distribution Detection.

Contrastive Language-Image Pretraining

FCOSR: A Simple Anchor-free Rotated Detector for Aerial Object Detection

Gas detection for Raspberry Pi using ADS1x15 and MQ-2 sensors

Official implementation of NeurIPS 2021 paper "Contextual Similarity Aggregation with Self-attention for Visual Re-ranking"

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Editing a classifier by rewriting its prediction rules

PyTorch implementation of D2C: Diffuison-Decoding Models for Few-shot Conditional Generation.

Deep Reinforcement Learning with pytorch & visdom

Face Identity Disentanglement via Latent Space Mapping [SIGGRAPH ASIA 2020]

DIVeR: Deterministic Integration for Volume Rendering

[NeurIPS 2021] Source code for the paper "Qu-ANTI-zation: Exploiting Neural Network Quantization for Achieving Adversarial Outcomes"

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning

Improving Contrastive Learning by Visualizing Feature Transformation, ICCV 2021 Oral

Zero-Cost Proxies for Lightweight NAS

training script for space time memory network

BirdCLEF 2021 - Birdcall Identification 4th place solution

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021