Journalism AI – Quotes extraction for modular journalism

This repo contains the code for the Guardian and AFP contribution for the JournalismAI Festival 2021.

Further reading can be found in our blog post.

The aim of the project is to extract quotes from news articles using Named Entity Recognition, add coreferencing information and format the results for an exploratory search tool.

The contribution consists of several self-contained pieces of work, namely:

a regular expression pipeline attempting to extract quotes by matching patterns
a rule set to define different types of quotes and guide the quote annotation
custom annotation recipes for the Prodigy software enabling quick and efficient data annotation
a post-processing pipeline for extracting quotes using a trained Spacy model and adding coreferencing information
example data and data schema for displaying the extracted quote information in a search tool

Repo structure

Each folder in this repo reflects one of the pieces of work mentioned above.

regex_pipeline/ – code to run the regular expression-based quote extraction
annotation_rules/ – document with rules and definitions to guide the quote annotation step
annotation_scripts/ – custom annotation scripts for Prodigy
coreference/ – proof of concept for rules-based coreferencing tool
schema/ – data output schema and example data

Each folder contains a separate README file with instructions to set up and run each piece of work.

Journalism AI – Quotes extraction for modular journalism

Related tags

Overview

Journalism AI – Quotes extraction for modular journalism

Repo structure

Owner

Journalism AI collab 2021

Twitter-Sentiment-Analysis - Twitter sentiment analysis for india's top online retailers(2019 to 2022)

Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

New Modeling The Background CodeBase

Build Text Rerankers with Deep Language Models

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

PG-19 Language Modelling Benchmark

BiNE: Bipartite Network Embedding

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation

Search for documents in a domain through Google. The objective is to extract metadata

ChatBotProyect - This is an unfinished project about a simple chatbot.

Translate - a PyTorch Language Library

端到端的长本文摘要模型（法研杯2020司法摘要赛道）

An automated program that helps customers of Pizza Palour place their pizza orders

Implemented shortest-circuit disambiguation, maximum probability disambiguation, HMM-based lexical annotation and BiLSTM+CRF-based named entity recognition

Transformation spoken text to written text

KR-FinBert And KR-FinBert-SC

Official PyTorch Implementation of paper "NeLF: Neural Light-transport Field for Single Portrait View Synthesis and Relighting", EGSR 2021.

State of the Art Natural Language Processing

Journey is a NLP-Powered Developer assistant

TensorFlow code and pre-trained models for BERT