Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Last update: Dec 19, 2022

Overview

Linear Transformers Are Secretly Fast Weight Programmers

This repository contains the code accompanying the paper Linear Transformers Are Secretly Fast Weight Programmers which is published at ICML'21. It also contains the logs of all synthetic experiments.

Synthetic Experiments

Requirements

$ cat req.txt 
jupyter==1.0.0
pandas==1.0.1
seaborn==0.10.0
torch==1.6.0
matplotlib==3.1.3
numpy==1.17.2

pip3 install -r req.txt

Rerun Experiments

Logs are provided in the synthetic/logs folder. The files in that folder are a result of running the following commands:

Setting 1 (capacity):

python3 main.py --begin=20 --end=600 --step=20 --attn_name=softmax --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=linear --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=2 --update_rule=sum

python3 main.py --begin=20 --end=600 --step=20 --attn_name=dpfp --attn_arg=3 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=64 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=128 --update_rule=sum
python3 main.py --begin=20 --end=600 --step=20 --attn_name=favor --attn_arg=512 --update_rule=sum

Setting 2 (update rule):

python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=sum --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=tanh --update_rule=fwm --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=1 --update_rule=fwm --replace

python3 main.py --begin=20 --end=200 --step=20 --attn_name=dpfp --attn_arg=2 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=linear --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=favor --attn_arg=64 --update_rule=ours --replace
python3 main.py --begin=20 --end=200 --step=20 --attn_name=favor --attn_arg=128 --update_rule=ours --replace

Generate figures from the logs using the following notebooks:

synthetic/setting1_generate_figure.ipynb
synthetic/setting2_generate_figure.ipynb

Language Modelling & Machine Translation

The toolkit and scripts for language modeling experiments can be found at IDSIA/lmtool-fwms.

For machine translation experiments, we ported the different attention functions implemented in the language modeling toolkit to the multi-head attention implementation in FAIRSEQ.

Citation

@inproceedings{schlag2021linear,
      title={Linear Transformers Are Secretly Fast Weight Programmers}, 
      author={Imanol Schlag and Kazuki Irie and J\"urgen Schmidhuber},
      booktitle={Proc. Int. Conf. on Machine Learning (ICML)},
      address = {Virtual only},
      month = jul,
      year={2021}
}

Official code repository of the paper Linear Transformers Are Secretly Fast Weight Programmers.

Related tags

Overview

Linear Transformers Are Secretly Fast Weight Programmers

Synthetic Experiments

Requirements

Rerun Experiments

Language Modelling & Machine Translation

Citation

Owner

Imanol Schlag

ZUNIT - Toward Zero-Shot Unsupervised Image-to-Image Translation

Codename generator using WordNet parts of speech database

MicBot - MicBot uses Google Translate to speak everyone's chat messages

spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines

Implementation of N-Grammer, augmenting Transformers with latent n-grams, in Pytorch

Text Classification in Turkish Texts with Bert

Club chatbot

Dope Wars game engine on StarkNet L2 roll-up

This is the 25 + 1 year anniversary version of the 1995 Rachford-Rice contest

Intent parsing and slot filling in PyTorch with seq2seq + attention

jiant is an NLP toolkit

RuCLIP tiny (Russian Contrastive Language–Image Pretraining) is a neural network trained to work with different pairs (images, texts).

STonKGs is a Sophisticated Transformer that can be jointly trained on biomedical text and knowledge graphs

Rhythm-Finder is a unsupervised ML driven python powered web-application that can find the songs that suits you.

Python3 to Crystal Translation using Python AST Walker

A demo for end-to-end English and Chinese text spotting using ABCNet.

Kinky furry assitant based on GPT2

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

Source code of the "Graph-Bert: Only Attention is Needed for Learning Graph Representations" paper

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Representation Learning"