Skip to content

Latest commit

 

History

History
134 lines (99 loc) · 9.39 KB

model_zoo.md

File metadata and controls

134 lines (99 loc) · 9.39 KB

TensorFlow DeepLab Model Zoo

We provide deeplab models pretrained on PASCAL VOC 2012 and Cityscapes datasets for reproducing our results, as well as some checkpoints that are only pretrained on ImageNet for training your own models.

DeepLab models trained on PASCAL VOC 2012

Un-tar'ed directory includes:

  • a frozen inference graph (frozen_inference_graph.pb). All frozen inference graphs use output stride of 8 and a single eval scale of 1.0. No left-right flips are used, and MobileNet-v2 based models do not include the decoder module.

  • a checkpoint (model.ckpt.data-00000-of-00001, model.ckpt.index)

Model details

We provide several checkpoints that have been pretrained on VOC 2012 train_aug set or train_aug + trainval set. In the former case, one could train their model with smaller batch size and freeze batch normalization when limited GPU memory is available, since we have already fine-tuned the batch normalization for you. In the latter case, one could directly evaluate the checkpoints on VOC 2012 test set or use this checkpoint for demo. Note MobileNet-v2 based models do not employ ASPP and decoder modules for fast computation.

Checkpoint name Network backbone Pretrained dataset ASPP Decoder
mobilenetv2_coco_voc_trainaug MobileNet-v2 MS-COCO
VOC 2012 train_aug set
N/A N/A
mobilenetv2_coco_voc_trainval MobileNet-v2 MS-COCO
VOC 2012 train_aug + trainval sets
N/A N/A
xception_coco_voc_trainaug Xception_65 MS-COCO
VOC 2012 train_aug set
[6,12,18] for OS=16
[12,24,36] for OS=8
OS = 4
xception_coco_voc_trainval Xception_65 MS-COCO
VOC 2012 train_aug + trainval sets
[6,12,18] for OS=16
[12,24,36] for OS=8
OS = 4

In the table, OS denotes output stride.

Checkpoint name Eval OS Eval scales Left-right Flip Multiply-Adds Runtime (sec) PASCAL mIOU File Size
mobilenetv2_coco_voc_trainaug 16
8
[1.0]
[0.5:0.25:1.75]
No
Yes
2.75B
152.59B
0.1
26.9
75.32% (val)
77.33 (val)
23MB
mobilenetv2_coco_voc_trainval 8 [0.5:0.25:1.75] Yes 152.59B 26.9 80.25% (test) 23MB
xception_coco_voc_trainaug 16
8
[1.0]
[0.5:0.25:1.75]
No
Yes
54.17B
3055.35B
0.7
223.2
82.20% (val)
83.58% (val)
439MB
xception_coco_voc_trainval 8 [0.5:0.25:1.75] Yes 3055.35B 223.2 87.80% (test) 439MB

In the table, we report both computation complexity (in terms of Multiply-Adds and CPU Runtime) and segmentation performance (in terms of mIOU) on the PASCAL VOC val or test set. The reported runtime is calculated by tfprof on a workstation with CPU E5-1650 v3 @ 3.50GHz and 32GB memory. Note that applying multi-scale inputs and left-right flips increases the segmentation performance but also significantly increases the computation and thus may not be suitable for real-time applications.

DeepLab models trained on Cityscapes

Model details

We provide several checkpoints that have been pretrained on Cityscapes train_fine set. Note MobileNet-v2 based model has been pretrained on MS-COCO dataset and does not employ ASPP and decoder modules for fast computation.

Checkpoint name Network backbone Pretrained dataset ASPP Decoder
mobilenetv2_coco_cityscapes_trainfine MobileNet-v2 MS-COCO
Cityscapes train_fine set
N/A N/A
xception_cityscapes_trainfine Xception_65 ImageNet
Cityscapes train_fine set
[6, 12, 18] for OS=16
[12, 24, 36] for OS=8
OS = 4

In the table, OS denotes output stride.

Checkpoint name Eval OS Eval scales Left-right Flip Multiply-Adds Runtime (sec) Cityscapes mIOU File Size
mobilenetv2_coco_cityscapes_trainfine 16
8
[1.0]
[0.75:0.25:1.25]
No
Yes
21.27B
433.24B
0.8
51.12
70.71% (val)
73.57% (val)
23MB
xception_cityscapes_trainfine 16
8
[1.0]
[0.75:0.25:1.25]
No
Yes
418.64B
8677.92B
5.0
422.8
78.79% (val)
80.42% (val)
439MB

Checkpoints pretrained on ImageNet

Un-tar'ed directory includes:

  • model checkpoint (model.ckpt.data-00000-of-00001, model.ckpt.index).

Model details

We also provide some checkpoints that are only pretrained on ImageNet so that one could use this for training your own models.

  • mobilenet_v2: We refer the interested users to the TensorFlow open source MobileNet-V2 for details.

  • xception: We adapt the original Xception model to the task of semantic segmentation with the following changes: (1) more layers, (2) all max pooling operations are replaced by strided (atrous) separable convolutions, and (3) extra batch-norm and ReLU after each 3x3 depthwise convolution are added.

Model name File Size
xception 447MB

References

  1. Mobilenets: Efficient convolutional neural networks for mobile vision applications
    Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam
    [link]. arXiv:1704.04861, 2017.

  2. Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
    Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
    [link]. arXiv:1801.04381, 2018.

  3. Xception: Deep Learning with Depthwise Separable Convolutions
    François Chollet
    [link]. In the Proc. of CVPR, 2017.

  4. Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry
    Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai
    [link]. ICCV COCO Challenge Workshop, 2017.

  5. The Pascal Visual Object Classes Challenge: A Retrospective
    Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John M. Winn, Andrew Zisserman
    [link]. IJCV, 2014.

  6. Semantic Contours from Inverse Detectors
    Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik
    [link]. In the Proc. of ICCV, 2011.

  7. The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele.
    [link]. In the Proc. of CVPR, 2016.

  8. Microsoft COCO: Common Objects in Context
    Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, Piotr Dollar
    [link]. In the Proc. of ECCV, 2014.

  9. ImageNet Large Scale Visual Recognition Challenge
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, Li Fei-Fei
    [link]. IJCV, 2015.