CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

此版本基于Pytorch后端的huggingface进行实现。由于此实现使用了Oneflow的dataloader作为数据读入的方式，因此也需要安装Oneflow。其它框架的数据读取可以参考OneflowDataloaderToPytorchDataset类的实现。

使用说明

安装依赖（前置要求：已在环境中安装好Pytorch和Oneflow）

pip install transformers pandas
git clone https://github.com/tea321000/hugging_face_competition
cd hugging_face_competition

运行train_BERT_base.sh和train_BERT_large.sh 单机单卡的baseline。保持其它参数不变，通过调节shell文件里的hidden_size参数，即可观察不同hidden_size所占显存的变化（可通过watch -n 0.1 nvidia-smi直观观察）

python train.py \
--ofrecord_path sample_seq_len_512_example \
--lr 1e-4 --epochs 10 \
--train_batch_size 2 \
--seq_length=512 \
--max_predictions_per_seq=80 \
--num_hidden_layers=24 \
--num_attention_heads=16 \
--hidden_size=1024 \#要调节的参数
--vocab_size=30522

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

Related tags

Overview

CCF BDCI BERT系统调优赛题baseline（Pytorch版本）

使用说明

Owner

Ziqi Zhou

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

File-based TF-IDF: Calculates keywords in a document, using a word corpus.

ADCS - Automatic Defect Classification System (ADCS) for SSMC

Speach Recognitions

MicBot - MicBot uses Google Translate to speak everyone's chat messages

jiant is an NLP toolkit

⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

Python3 to Crystal Translation using Python AST Walker

Python powered crossword generator with database with 20k+ polish words

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

TaCL: Improve BERT Pre-training with Token-aware Contrastive Learning

Chatbot with Pytorch, Python & Nextjs

Ukrainian TTS (text-to-speech) using Coqui TTS

NLTK Source

Text-to-Speech for Belarusian language

Facilitating the design, comparison and sharing of deep text matching models.

NL. The natural language programming language.