abbr-exp-ml4h

Implemenation of "Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches" (paper), ML4H (Machine Learning for Health) workshop at NeurIPS 2020.

This repository contains the non-sense-based (without gloss) and sense-based (with gloss) approaches to clinical abbreviation expansion based on BERT (The code of the one with permutation language model is coming soon in another repository). The code is based on BlueBERT (previously named as NCBI-BERT), which is a biomedical version of BERT.

Prerequisite

Tensorflow 1.12+
Pre-trained model of BlueBERT
A clinical abbreviation expansion dataset (MSH, UMN, or ShARe/CLEF 2013 Task 2)

How to Run

# Install required python packages on your environment
$ pip install -r requirement.txt

# Download the BlueBERT parameters
$ wget https://ftp.ncbi.nlm.nih.gov/pub/lu/Suppl/NCBI-BERT/NCBI_BERT_pubmed_mimic_uncased_L-12_H-768_A-12.zip
$ unzip NCBI_BERT_pubmed_mimic_uncased_L-12_H-768_A-12.zip -d bert_models

# Download and prepare dataset
$ ./scripts/download_umn.sh   # UMN
$ ./scripts/download_msh.sh   # MSH (downloading dataset required)
# Run the notebook scripts/preprocess_sc13t2.ipynb for the ShARe/CLEF dataset (manual downloading and installation required)

# Fine-tune and evaulate the model.
$ ./scripts/umn_masklm2.sh     # Masked LM on UMN, one of 10-fold CV
$ ./scripts/msh_masklm2_new.sh # Masked LM on MSH, one of 10-fold CV
$ ./scripts/sc13t2_masklm2.sh  # Masked LM on ShARe/CLEF. Please run scripts/evaluate_sc13t2_lrabr.ipynb to compute the accuracy on test unseen examples.

Acknowledgement

We thank the authors of BERT and BlueBERT for the implementation and the weights pre-trained on biomedical corpora.

Cite this work

@InProceeings{juyong2020improved,
  author    = {Juyong Kim and Linyuan Gong and Justin Khim and Jeremy C. Weiss and Pradeep Ravikumar},
  title     = {Improved Clinical Abbreviation Expansion via Non-Sense-Based Approaches},
  booktitle = {Proceedings of the Machine Learning for Health NeurIPS Workshop (ML4H 2020)}
  year      = {2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
abbr		abbr
bert		bert
scripts		scripts
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

abbr-exp-ml4h

Prerequisite

How to Run

Acknowledgement

Cite this work

About

Releases

Packages

Languages

dalgu90/abbr-exp-ml4h

Folders and files

Latest commit

History

Repository files navigation

abbr-exp-ml4h

Prerequisite

How to Run

Acknowledgement

Cite this work

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages