Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Last update: Dec 15, 2022

Related tags

Deep Learning Language-Identifier

Overview

Language Identifier

What is this ?

The goal of this project is to create a model that is able to predict a given sentence language through text processing, including tokenizing and representation of sentences as vectors and applying concepts such as RNN, LSTM and GRU to create the classifier that can detect the language among 17 languages.

Dataset

Language Detection It's a small language detection dataset. This dataset consists of text details for 17 different languages

Results

All models achieved high accuracy even when using one convolution layer instead of LSTM or GRU, But GRU achieved highest accuracy 99% training accuracy 94% validation accuracy.
Using convlution layer achieved high accuracy about 95% validation accuracy
Using fewer embedding dimensions makes the model reach high accuracy faster but in Embedding Projector alot of words grouped with other languages.

32 Embedding dimensions examples

3 Embedding dimensions examples

GRU Accuracy and Loss

GRU Confusion matrix

Libraries

Tensorflow
Scikit-learn
NumPy
Pandas
Matplotlib

Process text, including tokenizing and representing sentences as vectors and Applying some concepts like RNN, LSTM and GRU to create a classifier can detect the language in which a sentence is written from among 17 languages.

Related tags

Overview

Language Identifier

What is this ?

Dataset

Results

32 Embedding dimensions examples

3 Embedding dimensions examples

GRU Accuracy and Loss

GRU Confusion matrix

Libraries

Owner

Hossam Asaad

Official implementation of VQ-Diffusion

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Code and data for ImageCoDe, a contextual vison-and-language benchmark

Orthogonal Over-Parameterized Training

Method for facial emotion recognition compitition of Xunfei and Datawhale .

On Uncertainty, Tempering, and Data Augmentation in Bayesian Classification

SAPIEN Manipulation Skill Benchmark

Source code for the paper: Variance-Aware Machine Translation Test Sets (NeurIPS 2021 Datasets and Benchmarks Track)

Escaping the Gradient Vanishing: Periodic Alternatives of Softmax in Attention Mechanism

Neural Geometric Level of Detail: Real-time Rendering with Implicit 3D Shapes (CVPR 2021 Oral)

Official implementation of the paper Visual Parser: Representing Part-whole Hierarchies with Transformers

Python port of R's Comprehensive Dynamic Time Warp algorithm package

LSUN Dataset Documentation and Demo Code

PyTorch implementation for 3D human pose estimation

DeLiGAN - This project is an implementation of the Generative Adversarial Network

Bare bones use-case for deploying a containerized web app (built in streamlit) on AWS.

Convert game ISO and archives to CD CHD for emulation on Linux.

LibMTL: A PyTorch Library for Multi-Task Learning

Cascaded Deep Video Deblurring Using Temporal Sharpness Prior and Non-local Spatial-Temporal Similarity

PCAM: Product of Cross-Attention Matrices for Rigid Registration of Point Clouds