poutyne-transformers

Train 🤗 -transformers models with Poutyne.

Installation

pip install poutyne-transformers

Example

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
from datasets import load_dataset
from torch.utils.data import DataLoader
from torch import optim
from poutyne import Model
from poutyne_transformers import TransformerCollator, model_loss, ModelWrapper

print('Loading model & tokenizer.')
transformer = AutoModelForSequenceClassification.from_pretrained('distilbert-base-cased', num_labels=2, return_dict=True)
tokenizer = AutoTokenizer.from_pretrained('distilbert-base-cased')

print('Loading & preparing dataset.')
dataset = load_dataset("imdb")
dataset = dataset.map(lambda entry: tokenizer(entry['text'], add_special_tokens=True, padding='max_length', truncation=True), batched=True)
dataset = dataset.remove_columns(['text'])
dataset.set_format('torch')

collate_fn = TransformerCollator()
train_dataloader = DataLoader(dataset['train'], batch_size=16, collate_fn=collate_fn)
test_dataloader = DataLoader(dataset['test'], batch_size=16, collate_fn=collate_fn)

print('Preparing training.')
wrapped_transformer = ModelWrapper(transformer)
optimizer = optim.AdamW(wrapped_transformer.parameters(), lr=5e-5)
device = torch.device('cuda:0' if torch.cuda.is_available() else "cpu")
model = Model(wrapped_transformer, optimizer, loss_function=model_loss, device=device)

print('Starting training.')
model.fit_generator(train_dataloader, test_dataloader, epochs=1)

Train 🤗-transformers model with Poutyne.

Related tags

Overview

poutyne-transformers

Installation

Example

Owner

Lennart Keller

A collection of GNN-based fake news detection models.

Tool to check whether a GCP bucket is public or not.

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

Official Pytorch implementation of Test-Agnostic Long-Tailed Recognition by Test-Time Aggregating Diverse Experts with Self-Supervision.

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

Tensorflow Implementation of A Generative Flow for Text-to-Speech via Monotonic Alignment Search

A Streamlit web app that generates Rick and Morty stories using GPT2.

AI-powered literature discovery and review engine for medical/scientific papers

自然言語で書かれた時間情報表現を抽出/規格化するルールベースの解析器

Journalism AI – Quotes extraction for modular journalism

Official PyTorch implementation of Time-aware Large Kernel (TaLK) Convolutions (ICML 2020)

Machine learning models from Singapore's NLP research community

Text classification on IMDB dataset using Keras and Bi-LSTM network

Sequence-to-Sequence Framework in PyTorch

CredData is a set of files including credentials in open source projects

🐍💯pySBD (Python Sentence Boundary Disambiguation) is a rule-based sentence boundary detection that works out-of-the-box.

Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

AIDynamicTextReader - A simple dynamic text reader based on Artificial intelligence

A high-level yet extensible library for fast language model tuning via automatic prompt search