This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Overview

Introduction: X-Ray Report Generation

This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". Our work adopts x-ray (also including some history data for patients if there are any) as input, a CNN is used to learn the embedding features for x-ray, as a result, disease-state-style information (Previously, almost all work used detected disease embedding for input of text generation network which could possibly exclude the false negative diseases) is extracted and fed into the text generation network (transformer). To make sure the consistency of detected diseases and generated x-ray reports, we also create a interpreter to enforce the accuracy of the x-ray reports. For details, please refer to here.

Data we used for experiments

We use two datasets for experiments to validate our method:

Performance on two datasets

Datasets Methods BLEU-1 BLEU-2 BLEU-3 BLEU-4 METEOR ROUGE-L
Open-I Single-view 0.463 0.310 0.215 0.151 0.186 0.377
Multi-view 0.476 0.324 0.228 0.164 0.192 0.379
Multi-view w/ Clinical History 0.485 0.355 0.273 0.217 0.205 0.422
Full Model (w/ Interpreter) 0.515 0.378 0.293 0.235 0.219 0.436
MIMIC Single-view 0.447 0.290 0.200 0.144 0.186 0.317
Multi-view 0.451 0.292 0.201 0.144 0.185 0.320
Multi-view w/ Clinical History 0.491 0.357 0.276 0.223 0.213 0.389
Full Model (w/ Interpreter) 0.495 0.360 0.278 0.224 0.222 0.390

Environments for running codes

  • Operating System: Ubuntu 18.04

  • Hardware: tested with RTX 2080 TI (11G)

  • Software: tested with PyTorch 1.5.1, Python3.7, CUDA 10.0, tensorboardX, tqdm

  • Anaconda is strongly recommended

  • Other Libraries: Spacy, SentencePiece, nlg-eval

How to use our code for train/test

Step 0: Build your vocabulary model with SentencePiece (tools/vocab_builder.py)

  • Please make sure that you have preprocess the medical reports accurately.
  • We use the top 900 high-frequency words
  • We use 100 unigram tokens extracted from SentencePiece to avoid the out-of-vocabulary situation.
  • In total we have 1000 words and tokens. Update: You can skip step 0 and use the vocabulary files in Vocabulary/*.model

Step 1: Train the LSTM and/or Transformer models, which are just text classifiers, to obtain 14 common disease labels.

  • Use the train_text.py to train the models on your working datasets. For example, the MIMIC-CXR comes with CheXpert labels; you can use these labels as ground-truth to train a differentiable text classifier model. Here the text classifier is a binary predictor (postive/uncertain) = 1 and (negative/unmentioned) = 0.
  • Assume the trained text classifier is perfect and exactly reflects the medical reports. Although this is not the case, in practice, it gives us a good approximation of how good the generated reports are. Human evaluation is also needed to evalutate the generated reports.
  • The goals here are:
  1. Evaluate the performance of the generated reports by comparing the predicted labels and the ground-truth labels.
  2. Use the trained models to fine-tune medical reports' output.

Step 2: Test the text classifier models using the train_text.py with:

  • PHASE = 'TEST'
  • RELOAD = True --> Load the trained models for testing

Step 3: Transfer the trained model to obtain 14 common disease labels for the Open-I datasets and any dataset that doesn't have ground-truth labels.

  • Transfer the learned model to the new dataset by predicting 14 disease labels for the entire dataset by running extract_label.py on the target dataset. The output file is file2label.json
  • Split them into train, validation, and test sets (we have already done that for you, just put the file2label.json in a place where the NLMCXR dataset can see).
  • Build your own text classifier (train_text.py) based on the extracted disease labels (treat them as ground-truth labels).
  • In the end, we want the text classifiers (LSTM/Transformer) to best describe your model's output on the working dataset.

Step 4: Get additional labels using (tools/count_nounphrases.py)

  • Note that 14 disease labels are not enough to generate accurate reports. This is because for the same disease, we might have different ways to express it. For this reason, additional labels are needed to enhance the quality of medical reports.
  • The output of the coun_nounphrases.py is a json file, you can use it as input to the exising datasets such as MIMIC or NLMCXR.
  • Therefore, in total we have 14 disease labels + 100 noun-phrases = 114 disease-related topics/labels. Please check the appendix in our paper.

Step 5: Train the ClsGen model (Classifier-Generator) with train_full.py

  • PHASE = 'TRAIN'
  • RELOAD = False --> We trained our model from scratch

Step 6: Train the ClsGenInt model (Classifier-Generator-Interpreter) with train_full.py

  • PHASE = 'TRAIN'
  • RELOAD = True --> Load the ClsGen trained from the step 4, load the Interpreter model from Step 1 or 3
  • Reduce the learning rate --> Since the ClsGen has already converged, we need to reduce the learning rate to fine-tune the word representation such that it minimize the interpreter error.

Step 7: Generate the outputs

  • Use the infer function in the train_full.py to generate the outputs. This infer function ensures that no ground-truth labels and medical reports are being used in the inference phase (we used teacher forcing / ground-truth labels during training phase).
  • Also specify the threshold parameter, see the appendix of our paper on which threshold to choose from.
  • Final specify your the name of your output files.

Step 8: Evaluate the generated reports.

  • Use the trained text classifier model in step 1 to evaluate the clinical accuracy
  • Use the nlg-eval library to compute BLEU-1 to BLEU-4 scores and other metrics.

Our pretrained models

Our model is uploaded in google drive, please download the model from

Model Name Download Link
Our Model for MIMIC Google Drive
Our Model for NLMCXR Google Drive

Citation

If it is helpful to you, please cite our work:

@inproceedings{nguyen-etal-2021-automated,
    title = "Automated Generation of Accurate {\&} Fluent Medical {X}-ray Reports",
    author = "Nguyen, Hoang  and
      Nie, Dong  and
      Badamdorj, Taivanbat  and
      Liu, Yujie  and
      Zhu, Yingying  and
      Truong, Jason  and
      Cheng, Li",
    booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2021",
    address = "Online and Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.emnlp-main.288",
    doi = "10.18653/v1/2021.emnlp-main.288",
    pages = "3552--3569",
}

Owner
no name
no name
Over9000 optimizer

Optimizers and tests Every result is avg of 20 runs. Dataset LR Schedule Imagenette size 128, 5 epoch Imagewoof size 128, 5 epoch Adam - baseline OneC

Mikhail Grankin 405 Nov 27, 2022
PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT.

MoCo v3 for Self-supervised ResNet and ViT Introduction This is a PyTorch implementation of MoCo v3 for self-supervised ResNet and ViT. The original M

Facebook Research 887 Jan 08, 2023
Tensorflow Repo for "DeepGCNs: Can GCNs Go as Deep as CNNs?"

DeepGCNs: Can GCNs Go as Deep as CNNs? In this work, we present new ways to successfully train very deep GCNs. We borrow concepts from CNNs, mainly re

Guohao Li 612 Nov 15, 2022
Analysis of Antarctica sequencing samples contaminated with SARS-CoV-2

Analysis of SARS-CoV-2 reads in sequencing of 2018-2019 Antarctica samples in PRJNA692319 The samples analyzed here are described in this preprint, wh

Jesse Bloom 4 Feb 09, 2022
i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery

i-SpaSP: Structured Neural Pruning via Sparse Signal Recovery This is a public code repository for the publication: i-SpaSP: Structured Neural Pruning

Cameron Ronald Wolfe 5 Nov 04, 2022
SmartSim Infrastructure Library.

Home Install Documentation Slack Invite Cray Labs SmartSim SmartSim makes it easier to use common Machine Learning (ML) libraries like PyTorch and Ten

Cray Labs 139 Jan 01, 2023
Facilitates implementing deep neural-network backbones, data augmentations

Introduction Nowadays, the training of Deep Learning models is fragmented and unified. When AI engineers face up with one specific task, the common wa

40 Dec 29, 2022
The mini-MusicNet dataset

mini-MusicNet A music-domain dataset for multi-label classification Music transcription is sequence-to-sequence prediction problem: given an audio per

John Thickstun 4 Nov 09, 2022
Neural network-based build time estimation for additive manufacturing

Neural network-based build time estimation for additive manufacturing Oh, Y., Sharp, M., Sprock, T., & Kwon, S. (2021). Neural network-based build tim

Yosep 1 Nov 15, 2021
The object detection pipeline is based on Ultralytics YOLOv5

AYOLOv2 The main goal of this repository is to rewrite the object detection pipeline with a better code structure for better portability and adaptabil

153 Dec 22, 2022
High-fidelity 3D Model Compression based on Key Spheres

High-fidelity 3D Model Compression based on Key Spheres This repository contains the implementation of the paper: Yuanzhan Li, Yuqi Liu, Yujie Lu, Siy

5 Oct 11, 2022
[ECCV 2020] XingGAN for Person Image Generation

Contents XingGAN or CrossingGAN Installation Dataset Preparation Generating Images Using Pretrained Model Train and Test New Models Evaluation Acknowl

Hao Tang 218 Oct 29, 2022
RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching

RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching This repository contains the source code for our paper: RAFT-Stereo: Multilevel

Princeton Vision & Learning Lab 328 Jan 09, 2023
Pytorch code for "Text-Independent Speaker Verification Using 3D Convolutional Neural Networks".

:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification

Amirsina Torfi 114 Dec 18, 2022
Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

Learning Canonical Representations for Scene Graph to Image Generation (ECCV 2020) Roei Herzig*, Amir Bar*, Huijuan Xu, Gal Chechik, Trevor Darrell, A

roei_herzig 24 Jul 07, 2022
An open source machine learning library for performing regression tasks using RVM technique.

Introduction neonrvm is an open source machine learning library for performing regression tasks using RVM technique. It is written in C programming la

Siavash Eliasi 33 May 31, 2022
Lightweight Face Image Quality Assessment

LightQNet This is a demo code of training and testing [LightQNet] using Tensorflow. Uncertainty Losses: IDQ loss PCNet loss Uncertainty Networks: Mobi

Kaen 5 Nov 18, 2022
Official repository for the paper "Going Beyond Linear Transformers with Recurrent Fast Weight Programmers"

Recurrent Fast Weight Programmers This is the official repository containing the code we used to produce the experimental results reported in the pape

IDSIA 36 Nov 15, 2022
Classify music genre from a 10 second sound stream using a Neural Network.

MusicGenreClassification Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University. Featured in

Matan Lachmish 453 Dec 27, 2022