code for modular summarization work published in ACL2021 by Krishna et al

Last update: Nov 24, 2022

Related tags

Overview

This repository contains the code for running modular summarization pipelines as described in the publication
Krishna K, Khosla K, Bigham J, Lipton ZC. Generating SOAP Notes from Doctor-Patient Conversations." ACL 2021.

Instructions

Although we can not release models trained on the confidential medical data, we have released models trained on the publicly available AMI dataset.
To reproduce the results on the AMI dataset, you need to follow the steps listed below. For convenience, we have also created a Google Colab notebook here that runs these steps on Google's servers (free-of-cost as of June 2021) and produces the summaries and their rouge scores.

Step1: Set up the environment by installing the required packages mentioned in requirements.txt using pip.

Step2: Download the ami_models folder from this link and put it at the root of the repository:

Step3: Run the following 3 commands to prepare data, run summary generation pipelines, and show the achieved rouge scores.

# command1: downloads and preprocesses AMI dataset  
./prepare_data.sh  
  
 # command2: runs the summarization pipelines on the data and computes rouge scores  
 # (before running this command, you need to download the models as shown above)  
./predict_ami.sh  
  
# command3: print the results  
python show_results.py

code for modular summarization work published in ACL2021 by Krishna et al

Related tags

Overview

Instructions

Owner

Approximately Correct Machine Intelligence (ACMI) Lab

Train and use generative text models in a few lines of code.

Python generation script for BitBirds

A python script to prefab your scripts/text files, and re create them with ease and not have to open your browser to copy code or write code yourself

Plugin repository for Macast

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Code to use Augmented Shapiro Wilks Stopping, as well as code for the paper "Statistically Signifigant Stopping of Neural Network Training"

Shared code for training sentence embeddings with Flax / JAX

Every Google, Azure & IBM text to speech voice for free

Code to reproduce the results of the paper 'Towards Realistic Few-Shot Relation Extraction' (EMNLP 2021)

Blue Brain text mining toolbox for semantic search and structured information extraction

ProteinBERT is a universal protein language model pretrained on ~106M proteins from the UniRef90 dataset.

Findings of ACL 2021

Code repository of the paper Neural circuit policies enabling auditable autonomy published in Nature Machine Intelligence

Proquabet - Convert your prose into proquints and then you essentially have Vogon poetry

SAVI2I: Continuous and Diverse Image-to-Image Translation via Signed Attribute Vectors

Pattern Matching in Python

This script just scrapes the most recent Nepali news from Kathmandu Post and notifies the user about current events at regular intervals.It sends out the most recent news at random!

A minimal code for fairseq vq-wav2vec model inference.

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

Amazon Multilingual Counterfactual Dataset (AMCD)