The aim of this task is to predict someone's English proficiency based on a text input.

Last update: Dec 13, 2021

Overview

English_proficiency_prediction_NLP

The aim of this task is to predict someone's English proficiency based on a text input.

Using the The NICT JLE Corpus available here : https://alaginrc.nict.go.jp/nict_jle/index_E.html

The source of the corpus data is the transcripts of the audio-recorded speech samples of 1,281 participants (1.2 million words, 300 hours in total) of English oral proficiency interview test. Each participant got a SST (Standard Speaking Test) score between 1 (low proficiency) and 9 (high proficiency) based on this test.

The goal is to build a machine learning algorithm for predicting the SST score of each participant based on their transcript.

Steps:

1 - Pre-process the dataset: extract the participant transcript (all tags). Inside participant transcript, you can remove all other tags and extract only English words.

2 - Process the dataset: extract features with the Bag of Word (BoW) technique

3 - Train a classifier to predict the SST score

4 - Compute the accuracy of your system (the number of participant classified correctly) and plot the confusion matrix.

5 - Try to improve your system (for example you can try to use GloVe instead of BoW).

The aim of this task is to predict someone's English proficiency based on a text input.

Related tags

Overview

English_proficiency_prediction_NLP

Owner

Stanford CoreNLP provides a set of natural language analysis tools written in Java

Correctly generate plurals, ordinals, indefinite articles; convert numbers to words

Partially offline multi-language translator built upon Huggingface transformers.

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Blue Brain text mining toolbox for semantic search and structured information extraction

End-2-end speech synthesis with recurrent neural networks

This is the main repository of open-sourced speech technology by Huawei Noah's Ark Lab.

A Semi-Intelligent ChatBot filled with statistical and economical data for the Premier League.

Extracting Summary Knowledge Graphs from Long Documents

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

Fuzzy String Matching in Python

DaCy: The State of the Art Danish NLP pipeline using SpaCy

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

An example project using OpenPrompt under pytorch-lightning for prompt-based SST2 sentiment analysis model

edge-SR: Super-Resolution For The Masses

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

👑 spaCy building blocks and visualizers for Streamlit apps

Machine Learning Course Project, IMDB movie review sentiment analysis by lstm, cnn, and transformer