CMMD Mammography Classification Pipeline

This repository was created as part of my masters degree thesis.

The Chinese Mammography Database (CMMD) is a recently released mammography dataset which is rich in quality yet has not been researched extensively due to it's very recent release to The Cancer Imaging Archive (TCIA). This repository creates a deep learning pipeline which can be utilised either partially with the CMMD data pre-processing or as a full pipeline in which models can be interchanged or optimised where best suits. It is hoped that the research carried out within my dissertation and within this repository will act as an aid for future resarchers aiming to make use of the CMMD dataset.

A .pdf version of my dissertation including model results will be uploaded upon recieving marks/feedback on my submission. Until then, please feel free to make the most of this repository. Also please feel free to contact me if you have any questions whatsoever.

Good Luck.

Features

CMMD Data Exploration
CMMD Metadata Pre-processing
Split Data to Bening/Malignat with Stratification by Patient
- Train
- Validate
- Test
Data Augmentation
Training
- Custom Model Definition
  - AlexNet
  - LeNet
- Transfer Learning
  - ResNet50
  - VGG16
  - Xception
    - Fine Tuning
Testing
- Model Evaluation
- Model Predictions
- Model Metrics

Prerequisites

Python 3.8.10
CUDA 10.1

It should additionally be noted that this was written on Linux (Ubuntu 20.04.3 LTS). Attempt on Windows or other operating systems at own risk.

Get required Python packages:

git clone https://github.com/CraigMyles/cggm-mammography-classification.git
cd cggm-mammography-classification
pip install -r requirements.txt

Get Dataset

The CMMD dataset can be freely downloaded from The Cancer Imaging Archive (TCIA) You must use the NBIA Data Retriever to download The Chinese Mammography Database (CMMD).

For download instructions, follow this guide: Downloading Data from the TCIA Data Portal Using the Data Retriever.

Download the CMMD manifest and clinicaldata file to a folder within your working directory.

E.g.

/path/to/my/dir/dataset/manifest-1616439774456/
/path/to/my/dir/dataset/CMMD_clinicaldata_revision.xlsx

Run Instructions

To run the pipeline in its entirety, begin with file 0_ and run each notebook in sequential order tll 5_.

If you only require the CMMD preprocessing and metadata handling, please run sections 0_0_Data_Exploration.ipynb, 0_Data_Exploration.ipynb, and 1_stratification_data_split.ipynb.

A collated main.py has been added which allows for the pipeline to be run in it's entirety when given a path to the manifest folder and the .xlsx metadata file. For running in CLI run the following command with your relevant paths.

python3 main.py --manifest_path "/path/to/my/dir/dataset/manifest-1616439774456/" \
    --metadata_path "/path/to/my/dir/dataset/CMMD_clinicaldata_revision.xlsx"

This will run the entire pipeline with the Xception model and fine-tuning. For partial use of the program, comment out particular methods in the main, or refer to the Jupyter Notebooks. Please note that the main.py script may take upward of 12 hours to run. Please ensure that there is at least 25GB of storage available to hold the dataset and run preprocessing scripts.

Classification Pipeline

Contact

Craig Myles (me@craig.im)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
imgs		imgs
.gitignore		.gitignore
0_0_Data_Exploration.ipynb		0_0_Data_Exploration.ipynb
0_Data_Exploration.ipynb		0_Data_Exploration.ipynb
1_stratification_data_split.ipynb		1_stratification_data_split.ipynb
2_Candidate_Model_AlexNet.ipynb		2_Candidate_Model_AlexNet.ipynb
2_Candidate_Model_LeNet.ipynb		2_Candidate_Model_LeNet.ipynb
3_Transfer_Learning_Resnet50.ipynb		3_Transfer_Learning_Resnet50.ipynb
3_Transfer_Learning_VGG16.ipynb		3_Transfer_Learning_VGG16.ipynb
3_Transfer_Learning_Xception.ipynb		3_Transfer_Learning_Xception.ipynb
3_Transfer_Learning_Xception_Fine_Tuning.ipynb		3_Transfer_Learning_Xception_Fine_Tuning.ipynb
4_Results_Accuracy_By_Model.ipynb		4_Results_Accuracy_By_Model.ipynb
4_Results_and_Predictions.ipynb		4_Results_and_Predictions.ipynb
5_Data_Augmentation_Visualisation.ipynb		5_Data_Augmentation_Visualisation.ipynb
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMMD Mammography Classification Pipeline

Features

Prerequisites

Get Dataset

Run Instructions

Classification Pipeline

Contact

About

Languages

License

CraigMyles/cggm-mammography-classification

Folders and files

Latest commit

History

Repository files navigation

CMMD Mammography Classification Pipeline

Features

Prerequisites

Get Dataset

Run Instructions

Classification Pipeline

Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages