Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Last update: Oct 07, 2021

Overview

BLEU Score

Implementation for paper:

BLEU: a Method for Automatic Evaluation of Machine Translation

Author: Ba Ngoc from ProtonX

BLEU score is a popular metric to evaluate machine translation. Check out the recent Transformer project we published.

I. Usage

from bleu_score import cal_corpus_bleu_score

candidates = ['eating chicken chicken is a eating a eating chicken',
              'eating chicken chicken is not good']
references_list = [['a chicken is eating chicken', 'there is a chicken eating chicken'], [
    'a chicken is eating chicken', 'there is a chicken eating chicken']]

bleu_score = cal_corpus_bleu_score(candidates, references_list,
                      weights=(0.25, 0.25, 0.25, 0.25), N=4)

print('Bleu Score: {}'.format(bleu_score))

II. BLEU Score Formula

1. Precision

We count specific n-grams in the candidates and the number of those grams in the references. Then we calculate the proportion of two countings and get the precision.

Important to note: Count clip means that the number of typical n-grams can not exceed the maximum number of that n-grams in any single reference.

For example: if ('a', 'a') gram exists 3 times in a candidate. However, the maximum number of this gram in any single reference is 2. So we will use value 2 for calculation.

If you never heard about grams? It means that we count the number of continuous substrings with a pre-set length in a string.

Candidate 1: 'eating chicken chicken is a eating a eating chicken'

-------Unigram------


eating	3
chicken	3
is	1
a	2

-------bigrams------


eating chicken	2
chicken chicken	1
chicken is	1
is a	1
a eating	2
eating a	1

We can do the same thing with trigrams and 4-grams

2. Sentence brevity penalty

We prefer the reference with a length that is closest to the candidate's.

Checkout function get_eff_ref_length in utils.py.

c: the total lengths of all candidates

r: the total lengths of all effective reference lengths

3. BLEU Formula

N: the number of grams

w: list of pre-set weight for each gram

Implementation for paper BLEU: a Method for Automatic Evaluation of Machine Translation

Related tags

Overview

BLEU Score

1. Precision

2. Sentence brevity penalty

3. BLEU Formula

Owner

Ngoc Nguyen Ba

Python library for interactive topic model visualization. Port of the R LDAvis package.

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

Source code for CsiNet and CRNet using Fully Connected Layer-Shared feedback architecture.

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

A NLP program: tokenize method, PoS Tagging with deep learning

[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Finetune gpt-2 in google colab

AI_Assistant - This is a Python based Voice Assistant.

The code for two papers: Feedback Transformer and Expire-Span.

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

English loanwords in the world's languages

WikiPron - a command-line tool and Python API for mining multilingual pronunciation data from Wiktionary