Skip to content

The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization).

License

Notifications You must be signed in to change notification settings

Canjie-Luo/Real-300K

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Real-300K Dataset

  • The dataset used in the CVPR'22 paper entitled "SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization."
  • It contains 293096 realistic and diverse scene text images collected from public real training datasets.
  • Check the paper for details.

Download

  • Download the dataset from Google Drive or Baidu Cloud (Password: u8qe).
  • The MD5 value of Real-300K-DataBase.zip should be 'c3c9a91498f547ee24af52b573fb47be'.
  • Unzip the Real-300K-DataBase.zip file.
|___Real-300K-DataBase
|       |___data.mdb
|       |___lock.mdb
|___lmdb_visual.py
|___requirements.txt
|___README.md

Requirement

Python==3.7

pip install -r requirements.txt

Visualization

python3 lmdb_visual.py

Check the demo folder for details.

Citation

@inproceedings{luo2022siman,
  title={SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization},
  author={Luo, Canjie and Jin, Lianwen and Chen, Jingdong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={1039--1048},
  year={2022}
}

About

The dataset used in the CVPR 2022 paper (SimAN: Exploring Self-Supervised Representation Learning of Scene Text via Similarity-Aware Normalization).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages