Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Last update: May 18, 2022

Related tags

Deep Learning multiDDS

Overview

Balancing Training for Multilingual Neural Machine Translation

Implementation of the paper

Balancing Training for Multilingual Neural Machine Translation

Xinyi Wang, Yulia Tsvetkov, Graham Neubig

Data:

The preprocessed and binarized data for fairseq can be downloaded here

To process data from scrach, see the script

util_scripts/prepare_multilingual_data.sh

Training Scripts:

The training scripts for many-to-one translation of the related language group (Related M2O) is under the directory job_scripts/related_ted8_m2o/.

Our methods:

MultiDDS-S:

job_scripts/related_ted8_m2o/multidds_s.sh

MultiDDS:

job_scripts/related_ted8_m2o/multidds.sh

Baselines:

Proportional:

job_scripts/related_ted8_m2o/proportional.sh

Temperature:

job_scripts/related_ted8_m2o/temperature.sh

The scripts for Related O2M is under the directory job_scripts/related_ted8_o2m/

The scripts for Diverse M2O is under the directory job_scripts/diverse_ted8_m2o/

The scripts for Diverse O2M is under the directory job_scripts/diverse_ted8_o2m/

Inference Scripts:

Each of the experiment script directory contains a trans.sh file to translate the test set. To translate the test set for the Related M2O MultiDDS-S

job_scripts/related_ted8_m2o/trans.sh checkpoints/related_ted8_m2o/multidds_s/

To translate other experiment, simply replace the argument with the experiment checkpoint directory.

Citation

Please cite as:

@inproceedings{wang2020multiDDS,
  title = {Balancing Training for Multilingual Neural Machine Translation},
  author = {Xinyi Wang, Yulia Tsvetkov, Graham Neubig},
  booktitle = {ACL},
  year = {2020},
}

Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"

Related tags

Overview

Balancing Training for Multilingual Neural Machine Translation

Data:

Training Scripts:

Inference Scripts:

Citation

Owner

Xinyi Wang

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

AttentionGAN for Unpaired Image-to-Image Translation & Multi-Domain Image-to-Image Translation

End-to-end speech secognition toolkit

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

Codebase for INVASE: Instance-wise Variable Selection - 2019 ICLR

Human Detection - Pedestrian Detection using OpenCV Python

An OpenAI-Gym Package for Training and Testing Reinforcement Learning algorithms with OpenSim Models

Implementing SYNTHESIZER: Rethinking Self-Attention in Transformer Models using Pytorch

Self-Learned Video Rain Streak Removal: When Cyclic Consistency Meets Temporal Correspondence

A Convolutional Transformer for Keyword Spotting

Prososdy Morph: A python library for manipulating pitch and duration in an algorithmic way, for resynthesizing speech.

DUE: End-to-End Document Understanding Benchmark

Tensor-based approaches for fMRI classification

Calibrated Hyperspectral Image Reconstruction via Graph-based Self-Tuning Network.

Async API for controlling Hue Lights

PromptDet: Expand Your Detector Vocabulary with Uncurated Images

A new GCN model for Point Cloud Analyse

You Only Sample (Almost) Once: Linear Cost Self-Attention Via Bernoulli Sampling

D2Go is a toolkit for efficient deep learning

Implementation for paper: Self-Regulation for Semantic Segmentation