Toward Model Interpretability in Medical NLP

LING380: Topics in Computational Linguistics Final Project James Cross ([email protected]) and Daniel Kim ([email protected]), December 2021

Code Organization

data: contains medical report data [LINK TO THAT REPO] used in model fine-tuning and analysis, clinical stop words, and saved accuracy and entropy metrics during evaluation

models: checkpoints of the best performing BERT and BioBERT models after hyperparameter optimization

notebooks:

model_training.ipynb: code to train and fine-tune BERT and BioBERT

model_evaluation.ipynb: code to run various model evaluations, visualize word importances, perform post-training clinical stopword masking, and other analyses

scripts: same functionality as in the notebooks, in executable python scripts / functions

Dependencies

All packages needed to run the code are available in the default Google Colab environment (see documentation for full list), with the exception of huggingface (transformers), used for loading transformer models, and captum.ai (captum), which provides access for a variety of model interpretation tools.

How to run code

Two options available to run the code; on Google colab and/or locally on your machine.

Option 1) Google Colab

Model training notebook: [https://colab.research.google.com/drive/1uPIi-OVchs_8A-SNcQtLfwelr0ccsz19?usp=sharing] Model evaluation/analysis notebook: [https://colab.research.google.com/drive/1Hfy58JvyPbx55lKKhQAzzrhJIbN_Io0j?usp=sharing]

Option 2) Local Machine

Notebooks: You can run the model_training.ipynb or model_evaluation.ipynb notebooks as is, changing directory paths when needed.

Toward Model Interpretability in Medical NLP

Related tags

Overview

Toward Model Interpretability in Medical NLP

Code Organization

Dependencies

How to run code

Option 1) Google Colab

Option 2) Local Machine

Owner

Clone a voice in 5 seconds to generate arbitrary speech in real-time

Journalism AI – Quotes extraction for modular journalism

LUKE -- Language Understanding with Knowledge-based Embeddings

An Open-Source Package for Neural Relation Extraction (NRE)

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

MRC approach for Aspect-based Sentiment Analysis (ABSA)

Shared code for training sentence embeddings with Flax / JAX

Twewy-discord-chatbot - Build a Discord AI Chatbot that Speaks like Your Favorite Character

A list of NLP(Natural Language Processing) tutorials built on Tensorflow 2.0.

I label phrases on a scale of five values: negative, somewhat negative, neutral, somewhat positive, positive

Large-scale open domain KNOwledge grounded conVERsation system based on PaddlePaddle

用Resnet101+GPT搭建一个玩王者荣耀的AI

A complete NLP guideline for enthusiasts

Pangu-Alpha for Transformers

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

[ICLR 2021 Spotlight] Pytorch implementation for "Long-tailed Recognition by Routing Diverse Distribution-Aware Experts."

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

A python package to fine-tune transformer-based models for named entity recognition (NER).

Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021