Pipeline that uses EDDL and ECVL to train a CNN on five different datasets (MNIST, ISIC, PNEUMOTHORAX, MSSEG and DHUC11), applying different image augmentations, for both the classification and the segmentation task.
ECVL | EDDL |
---|---|
1.0.3 | 1.0.4b |
- CMake 3.13 or later
- C++ Compiler with C++17 support (e.g. GCC 7 or later, Clang 5.0 or later, Visual Studio 2017 or later)
- (Optional) ISIC dataset.
- (Optional) Pneumothorax dataset.
- (Optional) MSSEG dataset.
The YAML datasets format is described here. Each dataset listed below contains both the data and the YAML description format, but they can also be downloaded separately: ISIC classification, ISIC segmentation, Pneumothorax segmentation, Kidney segmentation, Kidney classification, Multiple Sclerosis Segmentation.
Automatically downloaded and extracted by CMake.
ISIC - isic-archive.com
Classification: Download it from here and extract it. To run skin_lesion_classification you must provide the --dataset_path
as /path/to/isic_classification.yml
(section Training options list other settings). See Pretrained models section to download checkpoints.
Classification_2018: Download it from here and extract it. To run skin_lesion_classification_2018 you must provide the --dataset_path
as /path/to/isic_classification_2018.yml
.
Segmentation: Download it from here and extract it. To run skin_lesion_segmentation you must provide the the --dataset_path
as /path/to/isic_segmentation.yml
(section Training options list other settings). See Pretrained models section to download checkpoints.
Dataset taken from a kaggle challenge (more details here).
- Download training and test images here.
- Download from here ground truth masks and the YAML dataset file.
- In order to copy the ground truth masks in the directory of the corresponding images, edit the
cpp/copy_ground_truth_pneumothorax.cpp
file with the path to the downloaded dataset and ground truth directory and run it. Move the YAML file in thesiim
dataset folder.
Short video in which these steps are shown.
From the 2669 distinct training images with mask, 200 are randomly sampled as validation set.
- Training set: 3086 total images - 80% with mask and 20% without mask.
- Validation set: 250 total images - 80% with mask and 20% without mask.
UC11 dataset, images cannot be provided publicly.
Dataset of the MSSEG challenge which took place during MICCAI 2016 (https://portal.fli-iam.irisa.fr/msseg-challenge).
- Subscribe the challenge in order to dowload data here.
- Download the script at https://raw.githubusercontent.com/deephealthproject/use_case_pipeline/3rd_hackathon/dataset/extract_data.sh, save it in
MSSEG
folder, and run it.cd ~ mkdir MSSEG && cd MSSEG wget https://raw.githubusercontent.com/deephealthproject/use_case_pipeline/3rd_hackathon/dataset/extract_data.sh chmod +x extract_data.sh ./extract_data.sh
- Place the
ms_segmentation.yaml
and put it insideMSSEG
directory.wget https://raw.githubusercontent.com/deephealthproject/use_case_pipeline/3rd_hackathon/dataset/ms_segmentation.yml
On Linux systems, starting from CUDA 10.1, cuBLAS libraries are installed in the /usr/lib/<arch>-linux-gnu/
or /usr/lib64/
. Create a symlink to resolve the issue:
sudo ln -s /usr/lib/<arch>-linux-gnu/libcublas.so /usr/local/cuda-10.1/lib64/libcublas.so
-
*nix
-
Building from scratch, assuming CUDA driver already installed if you want to use GPUs (video in which these steps are performed in a clean nvidia docker image):
sudo apt update sudo apt install wget git make gcc-8 g++-8 # cmake version >= 3.13 is required for ECVL wget https://cmake.org/files/v3.13/cmake-3.13.5-Linux-x86_64.tar.gz tar -xf cmake-3.13.5-Linux-x86_64.tar.gz # symbolic link for cmake sudo ln -s /<path/to>/cmake-3.13.5-Linux-x86_64/bin/cmake /usr/bin/cmake # symbolic link for cublas if we have cuda >= 10.1 sudo ln -s /usr/lib/<arch>-linux-gnu/libcublas.so /usr/local/cuda-10.1/lib64/libcublas.so # if other versions of gcc (e.g., gcc-7) are present, set a higher priority to gcc-8 so that it is chosen as the default sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-8 80 --slave /usr/bin/g++ g++ /usr/bin/g++-8 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-7 70 --slave /usr/bin/g++ g++ /usr/bin/g++-7 git clone https://github.com/deephealthproject/use-case-pipelines.git cd use-case-pipelines # install dependencies as sudo so that they will be installed in "standard" system directories chmod u+x install_dependencies.sh sudo ./install_dependencies.sh # install EDDL, OpenCV, ECVL and build the pipeline chmod u+x build_pipeline.sh ./build_pipeline.sh
-
Building with all the dependencies already installed:
git clone https://github.com/deephealthproject/use-case-pipelines.git cd use-case-pipelines mkdir build && cd build # if ECVL is not installed in a "standard" system directory (like /usr/local/) you have to provide the installation directory cmake -Decvl_DIR=/<path/to>/ecvl/build/install .. make
-
-
Windows
- Building assuming
cmake >= 3.13
,git
, Visual Studio 2017 or 2019, CUDA driver (if you want to use GPUs) already installed# install EDDL and all its dependencies, OpenCV, ECVL and build the pipeline git clone https://github.com/deephealthproject/use-case-pipelines.git cd use-case-pipelines build_pipeline.bat
- Building assuming
N.B. EDDL is built for GPU by default.
The project creates different executables: MNIST_BATCH, MNIST_BATCH_FASTER, SKIN_LESION_CLASSIFICATION, SKIN_LESION_SEGMENTATION, PNEUMOTHORAX_SEGMENTATION, KIDNEY_SEGMENTATION, KIDNEY_CLASSIFICATION, MS_SEGMENTATION.
-
Training:
- MNIST_BATCH load the dataset with the deprecated ECVL LoadBatch which is not parallelized. All the other executables run with a custom number of parallel threads. Default settings here.
- MNIST_BATCH_FASTER (default settings) and SKIN_LESION_CLASSIFICATION (default settings) train the neural network loading the dataset in batches (needed when the dataset is too large to fit in memory).
- SKIN_LESION_SEGMENTATION (default settings) trains the neural network loading the dataset (images and their ground truth masks) in batches for the segmentation task.
- PNEUMOTHORAX_SEGMENTATION (default settings) trains the neural network loading the dataset (images and their ground truth masks) in batches with a custom function for this specific segmentation task.
- KIDNEY_SEGMENTATION (default settings) trains the neural network loading the dataset (volumes and their ground truth masks), dividing them in slices with a custom function for this specific segmentation task.
- KIDNEY_CLASSIFICATION (default settings) trains the neural network loading the dataset (DICOM images and their labels), and calculate the metrics aggregating the predictions for each patient.
- MS_SEGMENTATION trains the neural network loading the dataset (volumes and their ground truth masks) in batches with a custom function for this specific segmentation task. Each volume is loaded in memory and then some slices (specified by
in_channels
variable) are extracted and used as input for the neural network.
-
Inference:
- To perform only inference the
--skip_train
option has to be provided, and you will most likely want to provide a checkpoint with weights from a previous training process as well with the--checkpoint
option. See Pretrained models section for checkpoints. For SKIN_LESION_SEGMENTATION you can perform the ensemble providing the--checkpoint_dir
as the folder with all your checkpoints and the--ensemble
option.
- To perform only inference the
-e, --epochs Number of training epochs
-b, --batch_size Number of images for each batch
-n, --num_classes Number of output classes
-s, --size Size to which resize the input images
--loss Loss function
-l, --learning_rate Learning rate
--momentum Momentum (default: 0.9)
--model Model of the network
-g, --gpu Which GPUs to use. If not given, the network will run on CPU. (examples: --gpu 1 or --gpu=0,1 or --gpu=1,1)
--lsb How many batches are processed before synchronizing the model weights (default: 1)
-m, --mem CS memory usage configuration (default: low_mem, other possibilities: mid_mem, full_mem)
--save_images Save validation images or not (default: false)
-r, --result_dir Directory where the output images will be stored (default: ../output_images)
--checkpoint_dir Directory where the checkpoints will be stored (default: ../checkpoints)
-d, --dataset_path Dataset path (mandatory - except for the mnist pipelines)
-c, --checkpoint Path to the onnx checkpoint file
--exp_name Experiment name
-i, --input_channels Number of the network input channels
-w --workers Number of parallel threads which produce tensors from images and labels
-q, --queue_ratio Maximum queue size in which producers store samples will be: batch_size\*workers\*queue_ratio
--resume Resume training from this epoch
-t, --skip_train Skip training and perform only test (default: false)
--ensemble Perform ensemble (only available for skin_lesion_segmentation)
-h, --help Print usage
Model | Metric | Validation | Test | ONNX | |
---|---|---|---|---|---|
ISIC classification | ResNet50 | Accuracy | 0.854 | 0.8394 | download |
ISIC classification 2018 | ResNet152 | Accuracy | 0.887 | 0.896 | download |
Kidney segmentation | UNet | Dice | 0.8786 | 0.8634 | download |
Kidney classification | ResNet101 | Accuracy | 0.5545 | 0.6489 | download |
MS segmentation | Nabla | Dice | 0.83 | 0.81 | download |
Model | Metric | Validation | Test | ONNX | |
---|---|---|---|---|---|
ISIC segmentation | DeepLabV3Plus | MIoU | 0.746 | 0.746 | download |
ISIC segmentation | SegNet (BCE) | MIoU | 0.750 | 0.770 | download |
ISIC segmentation | SegNet (Dice) | MIoU | 0.756 | 0.768 | download |
ISIC segmentation | UNet++ | MIoU | 0.782 | 0.771 | download |
ISIC segmentation | UNet | MIoU | 0.770 | 0.763 | download |
ISIC segmentation | LinkNet (VGG) | MIoU | 0.770 | 0.752 | download |
ISIC segmentation | LinkNet (ResNet101) | MIoU | 0.762 | 0.763 | download |
Ensemble ISIC segmentation | MIoU | 0.794 | 0.800 |
- Examples of output for the pre-trained models provided:
-
ISIC segmentation test set:
The red line represents the prediction processed by ECVL to obtain contours that are overlaid on the original image.
-
Pneumothorax segmentation validation set:
The red area represents the prediction, the green area the ground truth. The yellow area therefore represents the correctly predicted pixels.
-
Multiple Sclerosis Lesion segmentation validation set:
-