How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Last update: Jun 05, 2022

Related tags

Overview

This repo contains codes for the following paper:

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.
Aditya Shah, Chandresh Kumar Maurya, In Proceedings of the 18th International Conference on Natural Language Processing - (ACL 2021).

The presentation slides are available here

Requirements

Python 3.6 or higher
Pytorch >= 1.3.0
Pytorch_transformers (also known as transformers)
Pandas, Numpy, Pickle
Fasttext

Download the fasttext embed file:

The fasttext embedding file can be obtained here

Dataset

We release the benchmark sarcasm dataset for Hinglish language to facilitate further research on code-mix NLP.

We create a dataset using TweetScraper built on top of scrapy to extract code-mix hindi-english tweets. We pass search tags like #sarcasm, #humor, #bollywood, #cricket, etc., combined with most commonly used code-mix Hindi words as query. All the tweets with hashtags like #sarcasm, #sarcastic, #irony, #humor etc. are treated as positive. Non sarcastic tweets are extracted using general hashtags like #politics, #food, #movie, etc. The balanced dataset comprises of 166K tweets.

Finally, we preprocess and clean the data by removing urls, hashtags, mentions, and punctuation in the data. The respective files can be found here as train.csv, val.csv, and test.csv

Arguments:

--epochs:  number of total epochs to run, default=10

--batch-size: train batchsize, default=2

--lr: learning rate for the model, default=5.16e-05

--hidden_size_lstm: hidden size of lstm, default=1024

--hidden_size_linear: hidden size of linear layer, default=128

--seq_len: sequence lenght of input text, default=56

--clip: gradient clipping, default=0.218

--dropout: dropout value, default=0.198

--num_layers: number of lstm layers, default=1

--lstm_bidirectional: bidirectional lstm, default=False

--fasttext_embed_file: path to fasttext embedding file, default='new_hing_emb'

--train_dir: path to train file, default='train.csv'

--valid_dir: path to validation file, default='valid.csv'

--test_dir: path to test file, default='test.csv'

--checkpoint_dir: path to the saved, default='selfnet.pt'

--test: testing the model, default=False

Train

python main.py

Test

python main.py --test True

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection.

Related tags

Overview

Requirements

Download the fasttext embed file:

Dataset

Arguments:

Train

Test

Owner

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

KaziText is a tool for modelling common human errors.

[ECE NTUA] 👁 Computer Vision - Lab Projects & Theoretical Problem Sets (2020-2021)

Official Implementation of Domain-Aware Universal Style Transfer

Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields.

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

StyleGAN2 Webtoon / Anime Style Toonify

Do Neural Networks for Segmentation Understand Insideness?

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

Setup and customize deep learning environment in seconds.

Audio Source Separation is the process of separating a mixture into isolated sounds from individual sources

Pytorch code for "State-only Imitation with Transition Dynamics Mismatch" (ICLR 2020)

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Boosting Adversarial Attacks with Enhanced Momentum (BMVC 2021)

Code for our EMNLP 2021 paper "Learning Kernel-Smoothed Machine Translation with Retrieved Examples"

Vision-Language Pre-training for Image Captioning and Question Answering

Over-the-Air Ensemble Inference with Model Privacy

YOLTv4 builds upon YOLT and SIMRDWN, and updates these frameworks to use the most performant version of YOLO, YOLOv4

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Code repository for the paper "Doubly-Trained Adversarial Data Augmentation for Neural Machine Translation" with instructions to reproduce the results.