If you want to train a breast cancer classifier or a segmentation model using the CBIS-DDSM dataset, this repository may help you to easily extract the mammograms and the masks from the original folder.
- The dataset can be downloaded directly from the official site.
- If you want to go into detail about the CBIS-DDSM dataset, you can check this paper. It describes how to use the dataset and how the dataset was built.
Despite the paper stating that CBIDS-DDSM has 753 calcification cases and 891 mass cases, it is difficult to determine how many images this dataset actually has. According to the metadata provided in the CSV files, CBIS-DDSM contains 3103 mammograms, 465 of which have more than one abnormality. 2.458 mamograms (79.21%) belong to the training set, and 645 (20.79% ) belong to the test set. Furthermore, 3568 cropped mammograms and 3568 masks are included.
This script contains a function that retrieves the path of all mammograms on your local machine and merges each image path with its pathology in a data frame. The data frame is subsequently saved as a CSV file.
This script contains a function that retrieves the path of all patches in your local machine and then merges each mask path with its pathology in a data frame. This data frame is subsequently saved as CSV file. Note: There are more masks than mammograms since some mammograms have more than one lesion.
The images provided by CBIS-DDSM (mammograms, masks, crops of abnormalities) are saved in DICOM format. This function saves 16-bit mammogram from dicom as rescaled 16-bit png file.
This script is used to create the test and training set according to the standardized split given by the official paper. The path of all images is stored in a dataframe which is saved as CSV file.
In this repository, I implemented the deep learning classifier introduced in the paper "Deep Learning to Improve Breast Cancer Detection on Screening Mammography" using PyTorch and CBIS-DDSM dataset. The original code and model are available here. However, this code is in Keras.
My main goal is to provide an understandable implementation of this model, which can be helpful for everyone, especially those who are beginning to work with deep learning and are interested in medical applications.