APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Last update: Dec 06, 2022

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

APEACH is the first crowd-generated Korean evaluation dataset for hate speech detection. Sentences of the dataset are created by anonymous participants using an online crowdsourcing platform DeepNatural AI.

Sample Code :

Download

You can download benchmark set APEACH. APEACH/test.csv in this repository.

Dataset Description

APEACH : A hate-speech evaluation dataset generated in 2021, using generation method followd by APEACH paper.

Guidelines

APEACH-GUIDELINE

Topics

Lengths

Paper

https://arxiv.org/pdf/2202.12459.pdf

Experiment Code

Experiment Results

Name	Beep! Dev Dataset	Apeach (Ours)
SoongsilBERT-Base	0.8261	0.8424
SoongsilBERT-Small	0.8149	0.8228
KcBERT-base	0.8088	0.8086
KcBERT-large	0.8295	0.8116
DistillKoBERT	0.7570	0.7715
KoELECTRA-V3	0.7920	0.8101
KoBERT	0.8030	0.7885

We also share BEST model of our dataset which we trained in this experiment as checkpoint, demo webite and api.

Citation

@article{yang2022apeach,
  title={APEACH: Attacking Pejorative Expressions with Analysis on Crowd-Generated Hate Speech Evaluation Datasets},
  author={Yang, Kichang and Jang, Wonjun and Cho, Won Ik},
  journal={arXiv preprint arXiv:2202.12459},
  year={2022}
}

Contributors

The main contributors of the work ( * : equal contribution) :

License

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

APEACH: Attacking Pejorative Expressions with Analysis on Crowd-generated Hate Speech Evaluation Datasets

Related tags

Overview

APEACH - Korean Hate Speech Evaluation Datasets

Download

Dataset Description

Guidelines

Topics

Lengths

Paper

Experiment Code

Experiment Results

Citation

Contributors

License

Owner

Kevin-Yang

Header-only C++ HNSW implementation with python bindings

Exploring dimension-reduced embeddings

Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration

Random-Word-Generator - Generates meaningful words from dictionary with given no. of letters and words.

Deduplication is the task to combine different representations of the same real world entity.

Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Python library for parsing resumes using natural language processing and machine learning

Train 🤗transformers with DeepSpeed: ZeRO-2, ZeRO-3

This is the code for the EMNLP 2021 paper AEDA: An Easier Data Augmentation Technique for Text Classification

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Mednlp - Medical natural language parsing and utility library

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

What are the best Systems? New Perspectives on NLP Benchmarking

EasyTransfer is designed to make the development of transfer learning in NLP applications easier.

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

a test times augmentation toolkit based on paddle2.0.