GitHub - WengLab-InformaticsResearch/mcephe

Introduction

This repository has source codes for learning medical concept embeddings (MCEs) in the following paper:

Comparative Effectiveness of Medical Concept Embedding for Feature Engineering in Phenotyping
    Junghwan Lee*, Cong Liu*, Jae Hyun Kim, Alex Butler, Ning Shang,
    Chao Pang, Karthik Natarajan, Parick Ryan, Casey Ta, Chunhua Weng
    Preprint

Learning Medical Concept Embeddings

We use GloVe, skip-gram, node2vec, LINE, and singular value decomposition (SVD) for learning MCEs. For implementation of node2vec and LINE, we used OpenNE, which is an open source python toolkit for network embedding. For implementation of singular value decomposition, we used SciPy. The source code in this repository is to implement GloVe and skip-gram.

Preparing Dataset

Dataset should be prepared as two pickle files, encoded windowed-EHR and the dictionary for encoded medical concepts in the EHR. Example formats of the data are provided in data/.

Encoded windowed-EHR: This is a list of windowed-EHR where each window contains multiple medical concepts. For example, [["concept A", "concept B"], ["Concept A", "Concept C", "Concept D"], ...]. No need to deliminate different patients or windows since we only utilize co-occurrence of the concepts in the same window. All concepts must be encoded with corresponding integer.
Concept2id: This is a mapping dictionary for encoded medical concepts in the EHR. For example, if we encoded "Concept A" to integer 0 and "Concept B" to 1, concept2id will look like {0 : "Concept A", 1 : "Concept B"}.

Learning Medical Concept Embedding using GloVe

Install Python 3.5.2 and all packages in the requirements.txt.
Prepare dataset.

Start training

python src/GloVe.py --input_record <"path of the EHR dataset"> --input_concept2id <"path of the concept2id"> --output <"output path"> --dim <"dimensionality of the embedding"> --batch_size <"batch size for training"> --num_epochs <"training epochs"> --learning_rate <"learning rate for optimizer">

You can check descriptions and the default settings of hyperparameters at help.

python src/Glove.py --help

Learning Medical Concept Embedding using skip-gram

Install Python 3.5.2 and all packages in the requirements.txt.
Prepare dataset.

Start training

python src/skipgram.py --input_record <"path of the EHR dataset"> --input_concept2id <"path of the concept2id"> --output <"output path"> --dim <"dimensionality of the embedding"> --batch_size <"batch size for training"> --num_epochs <"training epochs"> --learning_rate <"learning rate for optimizer">

You can check descriptions and the default settings of hyperparameters at help.

python src/skipgram.py --help

Acknowledgement

The data containing protected health information have been removed from all publicly available materials.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Learning Medical Concept Embeddings

Preparing Dataset

Learning Medical Concept Embedding using GloVe

Learning Medical Concept Embedding using skip-gram

Acknowledgement

About

Releases

Packages

Languages

WengLab-InformaticsResearch/mcephe

Folders and files

Latest commit

History

Repository files navigation

Introduction

Learning Medical Concept Embeddings

Preparing Dataset

Learning Medical Concept Embedding using GloVe

Learning Medical Concept Embedding using skip-gram

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages