Official code repository for: Polyp Segmentation with the FCB-SwinV2 Transformer
Authors: Kerr Fitzgerald, Jorge Bernal, Aymeric Histace and Bogdan J. Matuszewski
Links to the paper:
Polyp Segmentation with the FCB-SwinV2 Transformer IEEE ACCESS
Polyp segmentation within colonoscopy video frames using deep learning models has the potential to automate colonoscopy screening procedures. This could help improve the early lesion detection rate and in vivo characterization of polyps which could develop into colorectal cancer. Recent state-of-the-art deep learning polyp segmentation models have combined Convolutional Neural Network (CNN) architectures and Transformer Network (TN) architectures. Motivated by the aim of improving the performance of polyp segmentation models and their robustness to data variations beyond those covered during training, we propose a new CNN-TN hybrid model named the FCB-SwinV2 Transformer. This model was created by making extensive modifications to the recent state-of-the-art FCN-Transformer, including replacing the TN branch architecture with a SwinV2 U Net. The performance of the FCB-SwinV2 Transformer is evaluated on the popular colonoscopy segmentation benchmarking datasets Kvasir-SEG, CVC-ClinicDB and ETIS-LaribPolypDB. Generalizability tests are also conducted to determine if models can maintain accuracy when evaluated on data outside of the training distribution. The FCB-SwinV2 Transformer consistently achieves higher mean Dice and mean IoU scores when compared to other models reported in literature and therefore represents new state-of-the-art performance. The importance of understanding subtleties in evaluation metrics and dataset partitioning are also demonstrated and discussed.
Figure 1: Overall FCB-SwinV2 Transformer architecture consisting of a Transfromer Branch (TB) and Fully Convolutional Branch (FCB) which work in parallel.
Figure 2: SwinV2-UNET architecture used as the TB of the FCB-SwinV2 Transformer. The encoder stages reduce the spatial dimensions of feature maps while increasing the number of channel dimensions. Skip connections are used to pass feature maps generated by each stage of the encoder to decoder stages. The encoder is pre-trained using ImageNet22K.
Figure 3: (a) The decoder block [35] uses channel wise concatenation to combine previous decoder layer output with encoder skip connection output. (b) The structure of the SCSE module which combines the output of the SSCE and CSSE modules
-
Results produced in this work were generated using an Nvidia RTX 3090 GPU on a Linux operating system (Ubuntu).
-
Clone the repository and navigate to new directory.
-
Download and extract polyp segmentation datasets (e.g.Kvasir-SEG)
-
Resize the images and masks so they are of size 384x384.
-
If wanting to compare against exact data splits used in this study download the csv files contained in the 'split_information' folder withion this repository.
-
Download the SwinV2 ImageNet pre-trained weights.
-
Create folders for saving results (change save_string parameter), define paths to image/mask folders, define paths to .csv split files and create/define folder path for saving mask predictions.
Figure 4: Predictions made by the FCB-SwinV2 Transformer for images from the test set of the Kvasir-SEG dataset when trained and evaluated using the DUCK-Net data partitions. DUCKNET-34 and FCN-Transformer predicted segmentation maps are included for comparison.
This repository is released under the MIT License as found in the LICENSE.txt file.
If you use this work please cite:
K. Fitzgerald, J. Bernal, A. Histace and B. J. Matuszewski, "Polyp Segmentation with the FCB-SwinV2 Transformer," in IEEE Access, doi: 10.1109/ACCESS.2024.3376228.
This work makes use of data from the Kvasir-SEG dataset, available at https://datasets.simula.no/kvasir-seg/.
This work makes use of data from the CVC-ClinicDB dataset, available at https://polyp.grand-challenge.org/CVCClinicDB/.
This work makes use of data from the ETIS-LaribDB dataset, available at https://polyp.grand-challenge.org/ETISLarib/.
Results are obtained using ImageNet pre-trained weights for the SwinV2 Encoder system, available at SwinV2 Encoder Weights
This repository includes code from the following sources:
Links: CVML Group
Contact: kffitzgerald@uclan.ac.uk