You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Related tags

Deep LearningYOSO
Overview

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

Transformer-based models are widely used in natural language processing (NLP). Central to the transformer model is the self-attention mechanism, which captures the interactions of token pairs in the input sequences and depends quadratically on the sequence length. Training such models on longer sequences is expensive. In this paper, we show that a Bernoulli sampling attention mechanism based on Locality Sensitive Hash- ing (LSH), decreases the quadratic complexity of such models to linear. We bypass the quadratic cost by considering self-attention as a sum of individual tokens associated with Bernoulli random variables that can, in principle, be sampled at once by a single hash (although in practice, this number may be a small constant). This leads to an efficient sampling scheme to estimate self-attention which relies on specific modifications of LSH (to enable deployment on GPU architectures).

Requirements

docker, nvidia-docker

Start Docker Container

Under YOSO folder, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:4

For Nvidia's 30 series GPU, run

docker run --ipc=host --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES= -v "$PWD:/workspace" -it mlpen/transformers:5

Then, the YOSO folder is mapped to /workspace in the container.

BERT

Datasets

To be updated

Pre-training

To start pre-training of a specific configuration: create a folder YOSO/BERT/models/ (for example, bert-small) and write YOSO/BERT/models/ /config.json to specify model and training configuration, then under YOSO/BERT folder, run

python3 run_pretrain.py --model 
   

   

The command will create a YOSO/BERT/models/ /model folder holding all checkpoints and log file.

Pre-training from Different Model's Checkpoint

Copy a checkpoint (one of .model or .cp file) from YOSO/BERT/models/ /model folder to YOSO/BERT/models/ folder and add a key-value pair in YOSO/BERT/models/ /config.json : "from_cp": " " . One example is shown in YOSO/BERT/models/bert-small-4096/config.json. This procedure also works for extending the max sequence length of a model (For example, use bert-small pre-trained weights as initialization for bert-small-4096).

GLUE Fine-tuning

Under YOSO/BERT folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint cp-0249.model

The command will create a log file in YOSO/BERT/models/ /model .

Long Range Arena Benchmark

Datasets

To be updated

Run Evaluations

To start evaluation of a specific model on a task in LRA benchmark:

  • Create a folder YOSO/LRA/models/ (for example, softmax)
  • Write YOSO/LRA/models/ /config.json to specify model and training configuration

Under YOSO/LRA folder, run

python3 run_task.py --model 
   
     --task 
    

    
   

For example, run

python3 run_task.py --model softmax --task listops

The command will create a YOSO/LRA/models/ /model folder holding the best validation checkpoint and log file. After completion, the test set accuracy can be found in the last line of the log file.

RoBERTa

Datasets

To be updated

Pre-training

To start pretraining of a specific configuration:

  • Create a folder YOSO/RoBERTa/models/ (for example, bert-small)
  • Write YOSO/RoBERTa/models/ /config.json to specify model and training configuration

Under YOSO/RoBERTa folder, run

python3 run_pretrain.py --model 
   

   

For example, run

python3 run_pretrain.py --model bert-small

The command will create a YOSO/RoBERTa/models/ /model folder holding all checkpoints and log file.

GLUE Fine-tuning

To fine-tune model on GLUE tasks:

Under YOSO/RoBERTa folder, run

python3 run_glue.py --model 
   
     --batch_size 
    
      --lr 
     
       --task 
      
        --checkpoint 
        
       
      
     
    
   

For example,

python3 run_glue.py --model bert-small --batch_size 32 --lr 3e-5 --task MRPC --checkpoint 249

The command will create a log file in YOSO/RoBERTa/models/ /model .

Citation

@article{zeng2021yoso,
  title={You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling},
  author={Zhanpeng Zeng, Yunyang Xiong, Sathya N. Ravi, Shailesh Acharya, Glenn Fung, Vikas Singh},
  booktitle={Proceedings of the International Conference on Machine Learning},
  year={2021}
}
Owner
Zhanpeng Zeng
Zhanpeng Zeng
Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices

Face-Mesh Face Mesh is a face geometry solution that estimates 468 3D face landmarks in real-time even on mobile devices. It employs machine learning

Farnam Javadi 9 Dec 21, 2022
Testing and Estimation of structural breaks in Stata

xtbreak estimating and testing for many known and unknown structural breaks in time series and panel data. For an overview of xtbreak test see xtbreak

Jan Ditzen 13 Jun 19, 2022
Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

150 Dec 26, 2022
Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker

Deploying PyTorch Model to Production with FastAPI in CUDA-supported Docker A example FastAPI PyTorch Model deploy with nvidia/cuda base docker. Model

Ming 68 Jan 04, 2023
🗺 General purpose U-Network implemented in Keras for image segmentation

TF-Unet General purpose U-Network implemented in Keras for image segmentation Getting started • Training • Evaluation Getting started Looking for Jupy

Or Fleisher 2 Aug 31, 2022
data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer"

C2F-FWN data/code repository of "C2F-FWN: Coarse-to-Fine Flow Warping Network for Spatial-Temporal Consistent Motion Transfer" (https://arxiv.org/abs/

EKILI 46 Dec 14, 2022
This is a template for the Non-autoregressive Deep Learning-Based TTS model (in PyTorch).

Non-autoregressive Deep Learning-Based TTS Template This is a template for the Non-autoregressive TTS model. It contains Data Preprocessing Pipeline D

Keon Lee 13 Dec 05, 2022
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc

3.2k Jan 08, 2023
Imaginaire - NVIDIA's Deep Imagination Team's PyTorch Library

Imaginaire Docs | License | Installation | Model Zoo Imaginaire is a pytorch library that contains optimized implementation of several image and video

NVIDIA Research Projects 3.6k Dec 29, 2022
DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021)

DeepLM DeepLM: Large-scale Nonlinear Least Squares on Deep Learning Frameworks using Stochastic Domain Decomposition (CVPR 2021) Run Please install th

Jingwei Huang 130 Dec 02, 2022
(SIGIR2020) “Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback’’

Asymmetric Tri-training for Debiasing Missing-Not-At-Random Explicit Feedback About This repository accompanies the real-world experiments conducted i

yuta-saito 19 Dec 01, 2022
Tool for working with Y-chromosome data from YFull and FTDNA

ycomp ycomp is a tool for working with Y-chromosome data from YFull and FTDNA. Run ycomp -h for information on how to use the program. Installation Th

Alexander Regueiro 2 Jun 18, 2022
Code for the paper "Query Embedding on Hyper-relational Knowledge Graphs"

Query Embedding on Hyper-Relational Knowledge Graphs This repository contains the code used for the experiments in the paper Query Embedding on Hyper-

DimitrisAlivas 19 Jul 26, 2022
KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

KoRean based ELECTRA (KR-ELECTRA) This is a release of a Korean-specific ELECTRA model with comparable or better performances developed by the Computa

12 Jun 03, 2022
Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers.

Less is More: Pay Less Attention in Vision Transformers Official PyTorch implementation of Less is More: Pay Less Attention in Vision Transformers. By

73 Jan 01, 2023
Eth brownie struct encoding example

eth-brownie struct encoding example Overview This repository contains an example of encoding a struct, so that it can be used in a function call, usin

Ittai Svidler 2 Mar 04, 2022
Official pytorch implementation of Active Learning for deep object detection via probabilistic modeling (ICCV 2021)

Active Learning for Deep Object Detection via Probabilistic Modeling This repository is the official PyTorch implementation of Active Learning for Dee

NVIDIA Research Projects 130 Jan 06, 2023
Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach

Performance Analysis of Multi-user NOMA Wireless-Powered mMTC Networks: A Stochastic Geometry Approach Thanh Luan Nguyen, Tri Nhu Do, Georges Kaddoum

Thanh Luan Nguyen 2 Oct 10, 2022
The Deep Learning with Julia book, using Flux.jl.

Deep Learning with Julia DL with Julia is a book about how to do various deep learning tasks using the Julia programming language and specifically the

Logan Kilpatrick 67 Dec 25, 2022
This is an official implementation of CvT: Introducing Convolutions to Vision Transformers.

Introduction This is an official implementation of CvT: Introducing Convolutions to Vision Transformers. We present a new architecture, named Convolut

Microsoft 408 Dec 30, 2022