Modified GPT using average pooling to reduce the softmax attention memory constraints.

Last update: Dec 03, 2021

Overview

NLP-GPT-Upsampling

This repository contains an implementation of Open AI's GPT Model. In particular, this implementation takes inspiration from the Nystromformer implementation to approximate the full attention softmax matrix to model longer sequences in NLP language modeling tasks by a simple strided average pooling of the input text sequence to reduce the sequence length. The reduced length attention output is then upsampled back to the original sequence length using the bilinear method.

It should be noted that due to the simplicity of this implementation, the performance of the model will not be comparable to the original GPT model utilising the full attention matrix. The tradeoff is that this naive strided averaging would be able to model longer sequences as compared to the original GPT implementation.

Fig. 1: GPT Model Architecture (obtained from GPT paper)

Data

This repository includes codes to process the Movie Dialogue dataset, where the preparation of the data follows this script closely, as well as the Reddit Jokes dataset.

To prepare the data prior to training the model(s), run

python process_movie_dialogue_subword.py

for the Movie Dialogue dataset, or

python process_reddit_jokes_subword_v1.py

for the Reddit Jokes dataset.

Training and Model Inference

Having processed the data into sub-word tokens, run

python train_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py
python infer_movie_dialogue_sw_tf_ver2_gpt_keras_upsampled.py

python train_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py
python infer_reddit_jokes_sw_tf_ver2_gpt_keras_upsampled.py

to train the respective models based on the dataset loaded and perform inference of the trained model.

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Related tags

Overview

NLP-GPT-Upsampling

Data

Training and Model Inference

Owner

WD

This is my reading list for my PhD in AI, NLP, Deep Learning and more.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

"Investigating the Limitations of Transformers with Simple Arithmetic Tasks", 2021

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Analyse japanese ebooks using MeCab to determine the difficulty level for japanese learners

Test finetuning of XLSR (multilingual wav2vec 2.0) for other speech classification tasks

[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

A single model that parses Universal Dependencies across 75 languages.

The (extremely) naive sentiment classification function based on NBSVM trained on wisesight_sentiment

StarGAN - Official PyTorch Implementation

Data manipulation and transformation for audio signal processing, powered by PyTorch

Chinese version of GPT2 training code, using BERT tokenizer.

Understanding the Difficulty of Training Transformers

A complete NLP guideline for enthusiasts

Learning to Rewrite for Non-Autoregressive Neural Machine Translation

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

Practical Machine Learning with Python

NLP, Machine learning