This repository is the official implementation of Explainable Prediction of Acute Myocardial Infarction using Machine Learning and Shapley Values published in IEEE Access in November 2020.
-
To install Python 3, follow these instructions.
-
To install Pip, follow these instructions.
-
To install Jupyter Lab/Notebook, follow these instructions. To run Jupyter Lab/Notebook, follow these instructions.
-
To set up a virtual environment, follow these instructions.
-
To install requirements:
pip3 install -r requirements.txt
- To obtain the ECG ViEW II dataset, please use this form. After recieving the unprocessed files, follow the data processing steps below.
To process the ECG-ViEW II dataset as it is done in the paper (with robust scaling and SMOTE), run this notebook.
This notebook will produce two csv files, test.csv and train.csv, that you can then train/evaluate models with.
- To train the CNN model in the paper, run this notebook.
- To train the RNN model in the paper, run this notebook.
- To train the XGBoost model in the paper, run this notebook.
These notebooks will train the model and save it in a file that can be imported for evaluation later (described in the next section).
- To evaluate the CNN on the processed ECG-ViEW II data, run this notebook.
- To evaluate the RNN on the processed ECG-ViEW II data, run this notebook.
- To evaluate the XGBoost on the processed ECG-ViEW II data, run this notebook.
To reproduce the results in the paper, use the pretrained models. Additionally, to train and evaluate models without the age and sex features, please see these folders (CNN, RNN).
You can download pretrained models here: With age and sex:
Without age and sex:
Our models achieve the following performances:
Model | Accuracy | F1 Score | AUROC | Sensitivity | Specificity |
---|---|---|---|---|---|
CNN | 89.9 % | 89.0 % | 90.7 % | 88.1 % | 93.2% |
RNN | 84.6 % | 82.2 % | 82.9 % | 78.0 % | 87.8 % |
XGBoost | 97.5 % | 97.1 % | 96.5 % | 93.5 % | 99.4 % |
Shapley analysis on the XGBoost model shows that age, ACCI, and QRS duration are the most crucial variables in the prediction of the onset of AMI, while sex is of relatively less importance. The Shapley analysis is shown to be a promising technique to uncover the intricacies and mechanisms of the prediction model, leading to higher degree of interpretation and transparency.
The local explanation summary (beeswarm) plot gives an overview of the impact of features on the prediction, with each dot representing the Shapley value of every feature for all samples.
The global feature importance plot shows the average absolute of the Shapley values over the whole testing dataset. Age (Birthyeargroup), ACCI, and QRS duration were observed to be the most important features for the prediction.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/