SynthText is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout.
Download the SynthText.zip
file and unzip in [path-to-data-dir]
folder:
path-to-data-dir/
├── SynthText/
│ ├── 1/
│ │ ├── ant+hill_1_0.jpg
│ │ └── ...
│ ├── 2/
│ │ ├── ant+hill_4_0.jpg
│ │ └── ...
│ ├── ...
│ └── gt.mat
⚠️ Additionally, It is strongly recommended to pre-process theSynthText
dataset before using it as it contains some faulty data:python tools/dataset_converters/convert.py --dataset_name=synthtext --task=det --label_dir=/path-to-data-dir/SynthText/gt.mat --output_path=/path-to-data-dir/SynthText/gt_processed.mat --image_dir=/path-to-data-dir/SynthTextThis operation will generate a filtered output in the same format as the original
SynthText
.