MSRA Text Detection 500 Database(MSRA-TD500)Download Link
Please download the data from the website above and unzip the file. After unzipping the file, the data structure should be like:
MSRA-TD500
├── test
│ ├── IMG_0059.gt
│ ├── IMG_0059.JPG
│ ├── IMG_0080.gt
│ ├── IMG_0080.JPG
│ ├── ...
├── train
│ ├── IMG_0030.gt
│ ├── IMG_0030.JPG
│ ├── IMG_0063.gt
│ ├── IMG_0063.JPG
│ ├── ...
To prepare the data for text detection, you can run the following commands:
python tools/dataset_converters/convert.py \
--dataset_name td500 --task det \
--image_dir path/to/MSRA-TD500/train/ \
--label_dir path/to/MSRA-TD500/train \
--output_path path/to/MSRA-TD500/train_det_gt.txt
python tools/dataset_converters/convert.py \
--dataset_name td500 --task det \
--image_dir path/to/MSRA-TD500/test/ \
--label_dir path/to/MSRA-TD500/test \
--output_path path/to/MSRA-TD500/test_det_gt.txt
Then you can have two annotation files train_det_gt.txt
and test_det_gt.txt
under the folder MSRA-TD500/
.