- The dataset used in the CVPR'22 paper entitled "SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization."
- It contains 293096 realistic and diverse scene text images collected from public real training datasets.
- Check the paper for details.
- Download the dataset from Google Drive or Baidu Cloud (Password: u8qe).
- The MD5 value of
Real-300K-DataBase.zip
should be 'c3c9a91498f547ee24af52b573fb47be'. - Unzip the
Real-300K-DataBase.zip
file.
|___Real-300K-DataBase
| |___data.mdb
| |___lock.mdb
|___lmdb_visual.py
|___requirements.txt
|___README.md
Python==3.7
pip install -r requirements.txt
python3 lmdb_visual.py
Check the demo
folder for details.
@inproceedings{luo2022siman,
title={SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization},
author={Luo, Canjie and Jin, Lianwen and Chen, Jingdong},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={1039--1048},
year={2022}
}