offline-training-pipeline

This is the offline-training-pipeline for our project.

We adopt the offline training and online prediction Machine Learning System framework structure.

We used the recent DistilBERT pre-trained large-scale NLP language model and fine-tuned it for the downstream fake news classification task.

Initial fine-tune training dataset are adopted from CONSTRAINT workshop of AAAI21. For offline routine training and updating in the future, we will adopt the Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Fakenewsnet offers up-to-date datasets and is continuously been updated on a regular basis. We hope to track the lastest trend of popular fake news and broader fake news topic as well by doing offline-training of our model and achieve better performance in the online prediction.

References:

@misc{patwa2020fighting, title={Fighting an Infodemic: COVID-19 Fake News Dataset}, author={Parth Patwa and Shivam Sharma and Srinivas PYKL and Vineeth Guptha and Gitanjali Kumari and Md Shad Akhtar and Asif Ekbal and Amitava Das and Tanmoy Chakraborty}, year={2020}, eprint={2011.03327}, archivePrefix={arXiv}, primaryClass={cs.CL} }

@article{sanh2019distilbert, title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, author={Sanh, Victor and Debut, Lysandre and Chaumond, Julien and Wolf, Thomas}, journal={arXiv preprint arXiv:1910.01108}, year={2019} }

@article{shu2020fakenewsnet, title={Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media}, author={Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan}, journal={Big data}, volume={8}, number={3}, pages={171--188}, year={2020}, publisher={Mary Ann Liebert, Inc., publishers 140 Huguenot Street, 3rd Floor New~…} }

This is the offline-training-pipeline for our project.

Related tags

Overview

offline-training-pipeline

Owner

NLP: SLU tagging

The code from the whylogs workshop in DataTalks.Club on 29 March 2022

Text Classification Using LSTM

AllenNLP integration for Shiba: Japanese CANINE model

Help you discover excellent English projects and get rid of disturbing by other spoken language

Open-source offline translation library written in Python. Uses OpenNMT for translations

Grover is a model for Neural Fake News -- both generation and detectio

Pipeline for chemical image-to-text competition

Gpt2-WebAPI - The objective of this API is to provide the 3 best possible responses to sentences that the user would input via http GET request as a parameter

This repository contains Python scripts for extracting linguistic features from Filipino texts.

Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"

TFIDF-based QA system for AIO2 competition

Understand Text Summarization and create your own summarizer in python

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

AI-powered literature discovery and review engine for medical/scientific papers

Transformer-based Text Auto-encoder (T-TA) using TensorFlow 2.

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

A paper list of pre-trained language models (PLMs).