Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Last update: Oct 22, 2022

Related tags

Text Data & NLP Yarub_library

Overview

Yarub_library

#The Problem

اللغة العربية تعد من اكثر اللغات انتشارا و استخداما و تتميز لغة الضاد بثراء رصيدها من الكلمات والصيغ ، وهي لغة متميزة من الناحية الصوتية ، فقد اشتملت على جميع الأصوات التي اشتملت عليها اللغات السامية الأخرى . كما تتميز بالمرونة حيث تستوعب جميع الألفاظ المشتقة والمترادفة وتضع لكل مقام مقال لها

ادركنا اهمية اللغة العربية و مكانتها بين شعوب الشرق الاوسط و العالم, و نسعى فى ادراج اللغة العربية ضمن اللغات التى يتيسر استخدامها فى تطبيقات الذكاء الاصطناعى و معالجة اللغات الطبيعية للبشر

In this Omdena project, our goal was to develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications like Morphological analysis, Named Entity Recognition, Sentiment Analysis, Word Embedding, Dialect Identification, Part of speech, and so on the training dataset. This article contains interesting code and could be beneficial for whatever your level of experience, but for beginners, it is a great start-up in data collection using web scraping with referral links to official documentation pages for every mentioned library.

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Related tags

Overview

Yarub_library

Owner

BADER ALABDAN

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together

AutoGluon: AutoML for Text, Image, and Tabular Data

HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

Semi-automated vocabulary generation from semantic vector models

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

Code for lyric-section-to-comment generation based on huggingface transformers.

This program do translate english words to portuguese

A collection of models for image - text generation in ACM MM 2021.

结巴中文分词

💛 Code and Dataset for our EMNLP 2021 paper: "Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes"

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

A telegram bot to translate 100+ Languages

An open-source NLP library: fast text cleaning and preprocessing.

JaQuAD: Japanese Question Answering Dataset

Sequence-to-Sequence learning using PyTorch

A python package for deep multilingual punctuation prediction.

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Chinese version of GPT2 training code, using BERT tokenizer.

초성 해석기 based on ko-BART