Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures.

Overview

NLP_0-project

Group project for MFIN7036. Our goal is to predict firm profitability with text-based competition measures1. We are a "democratic" and collaborative group of five, and I mentioned our names based on our initial work division below 😄 .

Here is the outline of our project:

Data collection.

@LeiyuanHuo, jyang130, FanFanShark, xdc1999, gaojiamin1116

  • Based on file data-WRDS-list.csv, write a web-scraping algorithm to download all 10-Ks (html format) these companies filed to the SEC within 2010 to 2022 at Historical EDGAR documents, and rename them data-10K-COMPNAME-Year.html.
  • Parse html files to extract Business and MD&A sections.

Text Processing: feature extraction2

  • Part of Speech Tagging (POS) (mainly this method) to get product name, descriptions. Store these for each company.
  • Named Entity Recognition (NER) (also mainly this method) to get mentioned competitor names. Store these for each company.
  • Product texts: BoW and tf-idf for each company's product(s), and hopefully we have a term-product matrix then.
  • Competitor texts: definitely BoW, as we care about the frequency of being mentioned.
  • ‼️ We also need to combine sector and firm size/market power into competitor texts and re-count.

Text Processing: feature transformation and representation2

  • Term-product matrix: calculate cosine similarity scores for products pairwise; use score threshold to cluster products into similar groups.
  • Term-product matrix: directly apply clustering method (e.g., KMeans clustering) to product vectors, and cluster them.

Econometric Analysis and Hypothesis Testing2

  • Multivariate regression: DV is profitability (e.g., sales, revenue, Tobin's q), IV is competition measures (one from similar product count, one from mentions as competitors), also include relevant control variables.
  • Cross-section portfolios: our competition measures are cross-sectional (one for each year), so we can create long-short portfolios for both measures, and examine stock return effects.

Footnotes

  1. Two papers inspired this project. Citations: Eisdorfer, A., Froot, K., Ozik, G., & Sadka, R. (2021). Competition Links and Stock Returns. The Review of Financial Studies, The Review of financial studies, 2021-12-20. && Hoberg, G., & Phillips, G. (2016). Text-Based Network Industries and Endogenous Product Differentiation. The Journal of Political Economy, 124(5), 1423-1465.

  2. Text processing processes are based on MFIN7036 Lecture_Notes and a review paper. Citation: Marty, T., Vanstone, B., & Hahn, T. (2020). News media analytics in finance: A survey. Accounting and Finance (Parkville), 60(2), 1385-1434. 2 3

Code for paper ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization in the Loop.

Who Left the Dogs Out? Evaluation and demo code for our ECCV 2020 paper: Who Left the Dogs Out? 3D Animal Reconstruction with Expectation Maximization

Benjamin Biggs 29 Dec 28, 2022
NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

NanoDet-Plus⚡Super fast and lightweight anchor-free object detection model. 🔥Only 980 KB(int8) / 1.8MB (fp16) and run 97FPS on cellphone🔥

4.8k Jan 07, 2023
Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"

This repo contains the code for the EMNLP 2020 paper "Relation of the Relations: A New Paradigm of the Relation Extraction Problem" (Jin et al., 2020)

YYY 27 Oct 26, 2022
Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

Karate Club is an unsupervised machine learning extension library for NetworkX. Please look at the Documentation, relevant Paper, Promo Video, and Ext

Benedek Rozemberczki 1.8k Jan 07, 2023
Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021)

Revisiting Discriminator in GAN Compression: A Generator-discriminator Cooperative Compression Scheme (NeurIPS2021) Overview Prerequisites Linux Pytho

Shaojie Li 34 Mar 31, 2022
Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022)

Source code for EquiDock: Independent SE(3)-Equivariant Models for End-to-End Rigid Protein Docking (ICLR 2022) Please cite "Independent SE(3)-Equivar

Octavian Ganea 154 Jan 02, 2023
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
World Models with TensorFlow 2

World Models This repo reproduces the original implementation of World Models. This implementation uses TensorFlow 2.2. Docker The easiest way to hand

Zac Wellmer 234 Nov 30, 2022
Sequential GCN for Active Learning

Sequential GCN for Active Learning Please cite if using the code: Link to paper. Requirements: python 3.6+ torch 1.0+ pip libraries: tqdm, sklearn, sc

45 Dec 26, 2022
PED: DETR for Crowd Pedestrian Detection

PED: DETR for Crowd Pedestrian Detection Code for PED: DETR For (Crowd) Pedestrian Detection Paper PED: DETR for Crowd Pedestrian Detection Installati

36 Sep 13, 2022
KaziText is a tool for modelling common human errors.

KaziText KaziText is a tool for modelling common human errors. It estimates probabilities of individual error types (so called aspects) from grammatic

ÚFAL 3 Nov 24, 2022
Simple implementation of OpenAI CLIP model in PyTorch.

It was in January of 2021 that OpenAI announced two new models: DALL-E and CLIP, both multi-modality models connecting texts and images in some way. In this article we are going to implement CLIP mod

Moein Shariatnia 226 Jan 05, 2023
Statistical-Rethinking-with-Python-and-PyMC3 - Python/PyMC3 port of the examples in " Statistical Rethinking A Bayesian Course with Examples in R and Stan" by Richard McElreath

Statistical Rethinking with Python and PyMC3 This repository has been deprecated in favour of this one, please check that repository for updates, for

Osvaldo Martin 786 Dec 29, 2022
KakaoBrain KoGPT (Korean Generative Pre-trained Transformer)

KoGPT KoGPT (Korean Generative Pre-trained Transformer) https://github.com/kakaobrain/kogpt https://huggingface.co/kakaobrain/kogpt Model Descriptions

Kakao Brain 799 Dec 28, 2022
Nvdiffrast - Modular Primitives for High-Performance Differentiable Rendering

Nvdiffrast – Modular Primitives for High-Performance Differentiable Rendering Modular Primitives for High-Performance Differentiable Rendering Samuli

NVIDIA Research Projects 675 Jan 06, 2023
A Python library for differentiable optimal control on accelerators.

A Python library for differentiable optimal control on accelerators.

Google 80 Dec 21, 2022
The final project of "Applying AI to 2D Medical Imaging Data" of "AI for Healthcare" nanodegree - Udacity.

Pneumonia Detection from X-Rays Project Overview In this project, you will apply the skills that you have acquired in this 2D medical imaging course t

Omar Laham 1 Jan 14, 2022
StackNet is a computational, scalable and analytical Meta modelling framework

StackNet This repository contains StackNet Meta modelling methodology (and software) which is part of my work as a PhD Student in the computer science

Marios Michailidis 1.3k Dec 15, 2022
Swapping face using Face Mesh with TensorFlow Lite

Swapping face using Face Mesh with TensorFlow Lite

iwatake 17 Apr 26, 2022
Background-Click Supervision for Temporal Action Localization

Background-Click Supervision for Temporal Action Localization This repository is the official implementation of BackTAL. In this work, we study the te

LeYang 221 Oct 09, 2022