Here is the official reproduction repository of the paper TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution. TextDiff is a scene text super-resolution optimization model (see paper for details).
- python >= 3.7
- pytorch >= 1.7.0
- torchvision >= 0.8.0
- lmdb >= 0.98
- pillow >= 7.1.2
- numpy
- six
- tqdm
- python-opencv
- easydict
- yaml
- Add training code
- Add inference code
- Use DPM_solver to reduce inference step size
- If you think TextDiff is helpful to you, please give it a star, thank you!
- If you have any questions, please raise an issue and I will reply as soon as possible.
- If you are willing to use TextDiff as a baseline for your project, you are welcome to cite our paper.
- [1] Scene text telescope: Text-focused scene image super-resolution
- [2] Activating more pixels in image super-resolution transformer.
- [3] Srdiff: Single image super-resolution with diffusion probabilistic models.
- [4] DocDiff: Document Enhancement via Residual Diffusion Models
- [5] Improving Scene Text Image Super-Resolution via Dual Prior Modulation Network
If you use (part of) my code or find my work helpful, please consider citing
@article{liu2023textdiff,
title={TextDiff: Mask-Guided Residual Diffusion Models for Scene Text Image Super-Resolution},
author={Liu, Baolin and Yang, Zongyuan and Wang, Pengfei and Zhou, Junjie and Liu, Ziqi and Song, Ziyi and Liu, Yan and Xiong, Yongping},
journal={arXiv preprint arXiv:2308.06743},
year={2023}
}
This code is developed relying on DocDiff and TATT. Thanks for these great projects. Among them, DocDiff is the main research content of my classmate, and I participated in part of the research.