diff --git a/paper_zoo/textrecog/ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.yaml b/paper_zoo/textrecog/ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.yaml
new file mode 100644
index 000000000..75ab7e20f
--- /dev/null
+++ b/paper_zoo/textrecog/ASTER: An Attentional Scene Text Recognizer with Flexible Rectification.yaml	
@@ -0,0 +1,74 @@
+Title: 'ASTER: An Attentional Scene Text Recognizer with Flexible Rectification'
+Abbreviation: ASTER
+Tasks:
+ - TextRecog
+Venue: TPAMI
+Year: 2018
+Lab/Company:
+ - Huazhong University of Science and Technology, Wuhan, China
+URL:
+  Venue: 'https://ieeexplore.ieee.org/abstract/document/8395027/'
+  Arxiv: 'https://openreview.net/forum?id=S7EEqJHedpH'
+Paper Reading URL: N/A
+Code: 'https://github.com/ayumiymk/aster.pytorch'
+Supported In MMOCR: 'https://github.com/open-mmlab/mmocr/tree/dev-1.x/configs/textrecog/aster'
+PaperType:
+ - Algorithm
+Abstract: 'A challenging aspect of scene text recognition is to handle text with
+distortions or irregular layout. In particular, perspective text and curved
+text are common in natural scenes and are difficult to recognize. In this work,
+we introduce ASTER, an end-to-end neural network model that comprises a
+rectification network and a recognition network. The rectification network
+adaptively transforms an input image into a new one, rectifying the text in it.
+It is powered by a flexible Thin-Plate Spline transformation which handles a
+variety of text irregularities and is trained without human annotations. The
+recognition network is an attentional sequence-to-sequence model that predicts
+a character sequence directly from the rectified image. The whole model is
+trained end to end, requiring only images and their groundtruth text. Through
+extensive experiments, we verify the effectiveness of the rectification and
+demonstrate the state-of-the-art recognition performance of ASTER. Furthermore,
+we demonstrate that ASTER is a powerful component in end-to-end recognition
+systems, for its ability to enhance the detector.'
+MODELS:
+ Architecture:
+  - Attention
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213168893-7c600e03-c1f0-464a-8236-40ae26fbff89.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 86.0
+     IIIT5K:
+       WAICS: 93.4
+     SVT:
+       WAICS: 93.6
+     IC13:
+       WAICS: 94.5
+     IC15:
+       WAICS: 76.1
+     SVTP:
+       WAICS: 78.5
+     CUTE:
+       WAICS: 79.5
+Bibtex: '@article{shi2018aster,
+  title={Aster: An attentional scene text recognizer with flexible rectification},
+  author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
+  journal={IEEE transactions on pattern analysis and machine intelligence},
+  volume={41},
+  number={9},
+  pages={2035--2048},
+  year={2018},
+  publisher={IEEE}
+}'
diff --git a/paper_zoo/textrecog/Aggregation Cross-Entropy for Sequence Recognition.yaml b/paper_zoo/textrecog/Aggregation Cross-Entropy for Sequence Recognition.yaml
new file mode 100644
index 000000000..d392b46c2
--- /dev/null
+++ b/paper_zoo/textrecog/Aggregation Cross-Entropy for Sequence Recognition.yaml	
@@ -0,0 +1,68 @@
+Title: 'Aggregation Cross-Entropy for Sequence Recognition'
+Abbreviation: ACE
+Tasks:
+ - TextRecog
+Venue: CVPR
+Year: 2019
+Lab/Company:
+ - South China University of Technology
+URL:
+  Venue: 'http://openaccess.thecvf.com/content_CVPR_2019/html/Xie_Aggregation_Cross-Entropy_for_Sequence_Recognition_CVPR_2019_paper.html'
+  Arxiv: 'https://arxiv.org/abs/1904.08364'
+Paper Reading URL: N/A
+Code: 'https://github.com/summerlvsong/Aggregation-CrossEntropy'
+Supported In MMOCR: N/S
+PaperType:
+ - Algorithm
+Abstract: 'In this paper, we propose a novel method, aggregation cross-entropy
+(ACE), for sequence recognition from a brand new perspective. The ACE loss
+function exhibits competitive performance to CTC and the attention mechanism,
+with much quicker implementation (as it involves only four fundamental
+formulas), faster inference\back-propagation (approximately O(1) in parallel),
+less storage requirement (no parameter and negligible runtime memory), and
+convenient employment (by replacing CTC with ACE). Furthermore, the proposed
+ACE loss function exhibits two noteworthy properties: (1) it can be directly
+applied for 2D prediction by flattening the 2D prediction into 1D prediction
+as the input and (2) it requires only characters and their numbers in the
+sequence annotation for supervision, which allows it to advance beyond sequence
+recognition, e.g., counting problem.'
+MODELS:
+ Architecture:
+  - CTC
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213173571-fdf09df3-9769-4d52-bf44-6f58c9b5453d.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 79.4
+     IIIT5K:
+       WAICS: 82.3
+     SVT:
+       WAICS: 82.6
+     IC13:
+       WAICS: 89.7
+     IC15:
+       WAICS: 68.9
+     SVTP:
+       WAICS: 70.1
+     CUTE:
+       WAICS: 82.6
+Bibtex: '@inproceedings{xie2019aggregation,
+  title={Aggregation cross-entropy for sequence recognition},
+  author={Xie, Zecheng and Huang, Yaoxiong and Zhu, Yuanzhi and Jin, Lianwen and Liu, Yuliang and Xie, Lele},
+  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
+  pages={6538--6547},
+  year={2019}
+}'
diff --git a/paper_zoo/textrecog/An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.yaml b/paper_zoo/textrecog/An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.yaml
new file mode 100644
index 000000000..26bb3e0c9
--- /dev/null
+++ b/paper_zoo/textrecog/An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition.yaml	
@@ -0,0 +1,76 @@
+Title: 'An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition'
+Abbreviation: TPAMI
+Tasks:
+ - TextRecog
+Venue: TPAMI
+Year: 2016
+Lab/Company:
+ - School of Electronic Information and Communications Huazhong University of Science and Technology, Wuhan, China
+URL:
+  Venue: 'https://ieeexplore.ieee.org/abstract/document/7801919/'
+  Arxiv: 'https://arxiv.org/abs/1507.05717'
+Paper Reading URL: N/A
+Code: 'https://github.com/bgshih/crnn'
+Supported In MMOCR: 'https://github.com/open-mmlab/mmocr/tree/1.x/configs/textrecog/crnn'
+PaperType:
+ - Algorithm
+Abstract: 'Image-based sequence recognition has been a longstanding research
+topic in computer vision. In this paper, we investigate the problem of scene
+text recognition, which is among the most important and challenging tasks in
+image-based sequence recognition. A novel neural network architecture, which
+integrates feature extraction, sequence modeling and transcription into a unified
+framework, is proposed. Compared with previous systems for scene text recognition,
+the proposed architecture possesses four distinctive properties: (1) It is
+end-to-end trainable, in contrast to most of the existing algorithms whose
+components are separately trained and tuned. (2) It naturally handles sequences
+in arbitrary lengths, involving no character segmentation or horizontal scale
+ normalization. (3) It is not confined to any predefined lexicon and achieves
+ remarkable performances in both lexicon-free and lexicon-based scene text
+ recognition tasks. (4) It generates an effective yet much smaller model,
+ which is more practical for real-world application scenarios. The experiments
+ on standard benchmarks, including the IIIT-5K, Street View Text and ICDAR
+ datasets, demonstrate the superiority of the proposed algorithm over the prior
+ arts. Moreover, the proposed algorithm performs well in the task of image-based
+ music score recognition, which evidently verifies the generality of it.'
+MODELS:
+ Architecture:
+  - CTC
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213174579-e89dbd14-8ace-4f16-9cb6-4b882dbd4e27.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  8.3M
+ Experiment:
+   Training DataSets:
+     - MJ
+   Test DataSets:
+     Avg.: 81.9
+     IIIT5K:
+       WAICS: 78.2
+     SVT:
+       WAICS: 80.8
+     IC13:
+       WAICS: 86.7
+     IC15:
+       WAICS: N/A
+     SVTP:
+       WAICS: N/A
+     CUTE:
+       WAICS: N/A
+Bibtex: '@article{shi2016end,
+  title={An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition},
+  author={Shi, Baoguang and Bai, Xiang and Yao, Cong},
+  journal={IEEE transactions on pattern analysis and machine intelligence},
+  volume={39},
+  number={11},
+  pages={2298--2304},
+  year={2016},
+  publisher={IEEE}
+}'
diff --git a/paper_zoo/textrecog/Attention after Attention: Reading Text in the Wild with Cross Attention.yaml b/paper_zoo/textrecog/Attention after Attention: Reading Text in the Wild with Cross Attention.yaml
new file mode 100644
index 000000000..3741cc57b
--- /dev/null
+++ b/paper_zoo/textrecog/Attention after Attention: Reading Text in the Wild with Cross Attention.yaml	
@@ -0,0 +1,74 @@
+Title: 'Attention after Attention: Reading Text in the Wild with Cross Attention'
+Abbreviation: Huang et al.
+Tasks:
+ - TextRecog
+Venue: ICDAR
+Year: 2019
+Lab/Company:
+ - School of Electronic and Information Engineering, South China University of Technology, Guangzhou 510641, China
+URL:
+  Venue: 'https://ieeexplore.ieee.org/abstract/document/8977967/'
+  Arxiv: N/A
+Paper Reading URL: N/A
+Code: N/A
+Supported In MMOCR: N/S
+PaperType:
+ - Algorithm
+Abstract: 'Recent methods mostly regarded scene text recognition as a
+sequence-to-sequence problem. These methods roughly transform the image into a
+feature sequence and use the algorithms for sequence-to-sequence problem like
+CTC or attention to decode the characters. However, text in images is distributed
+in a two-dimensional (2D) space and roughly converting the features of text
+into a feature sequence may introduce extra noise, especially if the text is
+irregular. In this paper, we propose a novel framework named cross attention
+network, which learns to attend to local features of a 2D feature map
+corresponding to individual characters. The network contains two 1D attention
+networks, which operates harmoniously in two directions. Thus, one of the
+attention modules vertically attends to the features corresponding to the whole
+text of 2D features and the other horizontal module selects the local features
+to decode individual characters. Extensive experiments are performed on various
+regular benchmarks, including SVT, ICDAR2003, ICDAR2013, and IIIT5K-Words,
+which demonstrate that the proposed model either outperforms or is comparable
+to all previous methods. Moreover, the model is evaluated on irregular benchmarks
+including SVTPerspective, CUTE80 and ICDAR 2015. The performance on irregular
+benchmarks shows the robustness of our model.'
+MODELS:
+ Architecture:
+  - Attention
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213172663-4c3c5ea1-84b8-40e5-8453-e389c7ee5595.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 86.4
+     IIIT5K:
+       WAICS: 94.5
+     SVT:
+       WAICS: 90.0
+     IC13:
+       WAICS: 94.2
+     IC15:
+       WAICS: 75.3
+     SVTP:
+       WAICS: 79.8
+     CUTE:
+       WAICS: 84.7
+Bibtex: '@inproceedings{fang2021read,
+  title={Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition},
+  author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={7098--7107},
+  year={2021}
+}'
diff --git a/paper_zoo/textrecog/Decoupled Attention Network for Text Recognition.yaml b/paper_zoo/textrecog/Decoupled Attention Network for Text Recognition.yaml
new file mode 100644
index 000000000..6a44aaab0
--- /dev/null
+++ b/paper_zoo/textrecog/Decoupled Attention Network for Text Recognition.yaml	
@@ -0,0 +1,75 @@
+Title: 'Decoupled Attention Network for Text Recognition'
+Abbreviation: DAN
+Tasks:
+ - TextRecog
+Venue: AAAI
+Year: 2019
+Lab/Company:
+ - School of Electronic and Information Engineering, South China University of Technology
+ - Lenovo Research
+URL:
+  Venue: 'https://ojs.aaai.org/index.php/AAAI/article/view/6903'
+  Arxiv: 'https://arxiv.org/abs/1912.10205'
+Paper Reading URL: N/A
+Code: 'https://github.com/Wang-Tianwei/Decoupled-attentionnetwork'
+Supported In MMOCR: N/S
+PaperType:
+ - Algorithm
+Abstract: 'Text recognition has attracted considerable research interests because
+of its various applications. The cutting-edge text recognition methods are
+based on attention mechanisms. However, most of attention methods usually
+suffer from serious alignment problem due to its recurrency alignment operation,
+where the alignment relies on historical decoding results. To remedy this
+issue, we propose a decoupled attention network (DAN), which decouples the
+alignment operation from using historical decoding results. DAN is an effective,
+flexible and robust end-to-end text recognizer, which consists of three
+components: 1) a feature encoder that extracts visual features from the input
+image; 2) a convolutional alignment module that performs the alignment
+operation based on visual features from the encoder; and 3) a decoupled text
+decoder that makes final prediction by jointly using the feature map and
+attention maps. Experimental results show that DAN achieves state-of-the-art
+performance on multiple text recognition tasks, including offline handwritten
+text recognition and regular/irregular scene text recognition. Codes will be
+released.'
+MODELS:
+ Architecture:
+  - Attention
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213171943-35e9c57c-fdce-4866-91c4-a47dad9a7b3b.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 86.0
+     IIIT5K:
+       WAICS: 94.3
+     SVT:
+       WAICS: 89.2
+     IC13:
+       WAICS: 93.9
+     IC15:
+       WAICS: 74.5
+     SVTP:
+       WAICS: 80.0
+     CUTE:
+       WAICS: 84.4
+Bibtex: '@inproceedings{wang2020decoupled,
+  title={Decoupled attention network for text recognition},
+  author={Wang, Tianwei and Zhu, Yuanzhi and Jin, Lianwen and Luo, Canjie and Chen, Xiaoxue and Wu, Yaqiang and Wang, Qianying and Cai, Mingxiang},
+  booktitle={Proceedings of the AAAI conference on artificial intelligence},
+  volume={34},
+  number={07},
+  pages={12216--12224},
+  year={2020}
+}'
diff --git a/paper_zoo/textrecog/Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.yaml b/paper_zoo/textrecog/Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.yaml
new file mode 100644
index 000000000..009c99b5a
--- /dev/null
+++ b/paper_zoo/textrecog/Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition.yaml	
@@ -0,0 +1,76 @@
+Title: 'Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition'
+Abbreviation: ABINet
+Tasks:
+ - TextRecog
+Venue: CVPR
+Year: 2021
+Lab/Company:
+ - University of Science and Technology of China
+URL:
+  Venue: 'http://openaccess.thecvf.com/content/CVPR2021/html/Fang_Read_Like_Humans_Autonomous_Bidirectional_and_Iterative_Language_Modeling_for_CVPR_2021_paper.html'
+  Arxiv: 'https://arxiv.org/abs/2103.06495'
+Paper Reading URL: 'https://mp.weixin.qq.com/s/blBkim58-sUBR0EOxDvtvA'
+Code: 'https://github.com/FangShancheng/ABINet'
+Supported In MMOCR: 'https://github.com/open-mmlab/mmocr/tree/dev-1.x/configs/textrecog/abinet'
+PaperType:
+ - Algorithm
+Abstract: 'Linguistic knowledge is of great benefit to scene text recognition.
+However, how to effectively model linguistic rules in end-to-end deep
+networks remains a research challenge. In this paper, we argue that the
+limited capacity of language models comes from: 1) implicitly language
+modeling; 2) unidirectional feature representation; and 3) language model
+with noise input. Correspondingly, we propose an autonomous, bidirectional
+and iterative ABINet for scene text recognition. Firstly, the autonomous
+suggests to block gradient flow between vision and language models to enforce
+explicitly language modeling. Secondly, a novel bidirectional cloze network
+(BCN) as the language model is proposed based on bidirectional feature
+representation. Thirdly, we propose an execution manner of iterative correction
+for language model which can effectively alleviate the impact of noise input.
+Additionally, based on the ensemble of iterative predictions, we propose a
+self-training method which can learn from unlabeled images effectively.
+Extensive experiments indicate that ABINet has superiority on lowquality images
+and achieves state-of-the-art results on several mainstream benchmarks.
+Besides, the ABINet trained with ensemble self-training shows promising
+improvement in realizing human-level recognition. Code is available at
+https://github.com/FangShancheng/ABINet.'
+MODELS:
+ Architecture:
+  - Transformer
+ Learning Method:
+  - Supervised
+  - Semi-Supervised
+ Language Modality:
+  - Explicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213165915-d719091f-febe-4a57-b51f-a71b26afb543.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 93.6
+     IIIT5K:
+       WAICS: 97.2
+     SVT:
+       WAICS: 95.5
+     IC13:
+       WAICS: 97.7
+     IC15:
+       WAICS: 86.9
+     SVTP:
+       WAICS: 89.9
+     CUTE:
+       WAICS: 94.1
+Bibtex: '@inproceedings{fang2021read,
+  title={Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition},
+  author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={7098--7107},
+  year={2021}
+}'
diff --git a/paper_zoo/textrecog/Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition.yaml b/paper_zoo/textrecog/Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition.yaml
new file mode 100644
index 000000000..325a0531f
--- /dev/null
+++ b/paper_zoo/textrecog/Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition.yaml	
@@ -0,0 +1,72 @@
+Title: 'Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition'
+Abbreviation: SAR
+Tasks:
+ - TextRecog
+Venue: AAAI
+Year: 2019
+Lab/Company:
+ - Australian Centre for Robotic Vision, The University of Adelaide, Australia
+ - School of Computer Science, Northwestern Polytechnical University, China
+URL:
+  Venue: 'https://ojs.aaai.org/index.php/AAAI/article/view/4881'
+  Arxiv: 'https://arxiv.org/abs/1811.00751'
+Paper Reading URL: N/A
+Code: 'https://github.com/wangpengnorman/SAR-Strong-Baseline-for-Text-Recognition'
+Supported In MMOCR: 'https://github.com/open-mmlab/mmocr/tree/dev-1.x/configs/textrecog/sar'
+PaperType:
+ - Algorithm
+Abstract: 'Recognizing irregular text in natural scene images is challenging due
+to the large variance in text appearance, such as curvature, orientation and
+distortion. Most existing approaches rely heavily on sophisticated model designs
+and/or extra fine-grained annotations, which, to some extent, increase the
+difficulty in algorithm implementation and data collection. In this work, we
+propose an easy-to-implement strong baseline for irregular scene text
+recognition, using offthe-shelf neural network components and only word-level
+annotations. It is composed of a 31-layer ResNet, an LSTMbased encoder-decoder
+framework and a 2-dimensional attention module. Despite its simplicity, the
+proposed method is robust. It achieves state-of-the-art performance on irregular
+text recognition benchmarks and comparable results on regular text datasets.
+Code is available at: https://tinyurl.com/ShowAttendRead'
+MODELS:
+ Architecture:
+  - Attention
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213175678-efcb4452-b80b-4e0b-8d36-57fbfca2668b.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+     - Real
+   Test DataSets:
+     Avg.: 89.2
+     IIIT5K:
+       WAICS: 95.0
+     SVT:
+       WAICS: 91.2
+     IC13:
+       WAICS: 94.0
+     IC15:
+       WAICS: 78.8
+     SVTP:
+       WAICS: 86.4
+     CUTE:
+       WAICS: 89.6
+Bibtex: '@inproceedings{li2019show,
+  title={Show, attend and read: A simple and strong baseline for irregular text recognition},
+  author={Li, Hui and Wang, Peng and Shen, Chunhua and Zhang, Guyu},
+  booktitle={Proceedings of the AAAI conference on artificial intelligence},
+  volume={33},
+  number={01},
+  pages={8610--8617},
+  year={2019}
+}'
diff --git a/paper_zoo/textrecog/Text Recognition in the Wild: A Survey.yaml b/paper_zoo/textrecog/Text Recognition in the Wild: A Survey.yaml
new file mode 100644
index 000000000..63f824512
--- /dev/null
+++ b/paper_zoo/textrecog/Text Recognition in the Wild: A Survey.yaml	
@@ -0,0 +1,41 @@
+Title: 'Text Recognition in the Wild: A Survey'
+Abbreviation: Chen et al.
+Tasks:
+ - TextRecog
+Venue: Others
+Year: 2021
+Lab/Company:
+ - College of Electronic and Information Engineering, South China University of Technology, China
+URL:
+  Venue: 'https://dl.acm.org/doi/abs/10.1145/3440756'
+  Arxiv: 'https://arxiv.org/abs/2005.03492'
+Paper Reading URL: N/A
+Code: 'https://github.com/HCIILAB/Scene-Text-Recognition'
+Supported In MMOCR: N/S
+PaperType:
+ - Survey
+Abstract: 'The history of text can be traced back over thousands of years. Rich
+and precise semantic information carried by text is important in a wide range of
+vision-based application scenarios. Therefore, text recognition in natural
+scenes has been an active research field in computer vision and pattern
+recognition. In recent years, with the rise and development of deep learning,
+numerous methods have shown promising in terms of innovation, practicality,
+and efficiency. This paper aims to (1) summarize the fundamental problems and
+the state-of-the-art associated with scene text recognition; (2) introduce new
+insights and ideas; (3) provide a comprehensive review of publicly available
+resources; (4) point out directions for future work. In summary, this
+literature review attempts to present the entire picture of the field of
+scene text recognition. It provides a comprehensive reference for people
+entering this field, and could be helpful to inspire future research.
+Related resources are available at our Github
+repository: https://github.com/HCIILAB/Scene-Text-Recognition.'
+Bibtex: '@article{chen2021text,
+  title={Text recognition in the wild: A survey},
+  author={Chen, Xiaoxue and Jin, Lianwen and Zhu, Yuanzhi and Luo, Canjie and Wang, Tianwei},
+  journal={ACM Computing Surveys (CSUR)},
+  volume={54},
+  number={2},
+  pages={1--35},
+  year={2021},
+  publisher={ACM New York, NY, USA}
+}'
diff --git a/paper_zoo/textrecog/Towards Accurate Scene Text Recognition with Semantic Reasoning Networks.yaml b/paper_zoo/textrecog/Towards Accurate Scene Text Recognition with Semantic Reasoning Networks.yaml
new file mode 100644
index 000000000..efd6da4bf
--- /dev/null
+++ b/paper_zoo/textrecog/Towards Accurate Scene Text Recognition with Semantic Reasoning Networks.yaml	
@@ -0,0 +1,74 @@
+Title: 'Towards Accurate Scene Text Recognition with Semantic Reasoning Networks'
+Abbreviation: SRN
+Tasks:
+ - TextRecog
+Venue: CVPR
+Year: 2020
+Lab/Company:
+ - School of Artificial Intelligence, University of Chinese Academy of Sciences
+ - Department of Computer Vision Technology(VIS), Baidu Inc.
+URL:
+  Venue: 'http://openaccess.thecvf.com/content_CVPR_2020/html/Yu_Towards_Accurate_Scene_Text_Recognition_With_Semantic_Reasoning_Networks_CVPR_2020_paper.html'
+  Arxiv: 'https://arxiv.org/abs/2003.12294'
+Paper Reading URL: N/A
+Code: 'https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.6/configs/rec/rec_r50_fpn_srn.yml'
+Supported In MMOCR: N/S
+PaperType:
+ - Algorithm
+Abstract: 'Scene text image contains two levels of contents: visual texture and
+semantic information. Although the previous scene text recognition methods have
+made great progress over the past few years, the research on mining semantic
+information to assist text recognition attracts less attention, only RNN-like
+structures are explored to implicitly model semantic information. However, we
+observe that RNN based methods have some obvious shortcomings, such as
+time-dependent decoding manner and one-way serial transmission of semantic
+context, which greatly limit the help of semantic information and the
+computation efficiency. To mitigate these limitations, we propose a novel
+end-to-end trainable framework named semantic reasoning network (SRN) for
+accurate scene text recognition, where a global semantic reasoning module (GSRM)
+is introduced to capture global semantic context through multi-way parallel
+transmission. The state-of-the-art results on 7 public benchmarks, including
+regular text, irregular text and non-Latin long text, verify the effectiveness
+and robustness of the proposed method. In addition, the speed of SRN has
+significant advantages over the RNN based methods, demonstrating its value
+in practical use.'
+MODELS:
+ Architecture:
+  - Transformer
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Explicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213168893-7c600e03-c1f0-464a-8236-40ae26fbff89.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: N/A
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  N/A
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 90.9
+     IIIT5K:
+       WAICS: 94.8
+     SVT:
+       WAICS: 91.5
+     IC13:
+       WAICS: 95.5
+     IC15:
+       WAICS: 82.7
+     SVTP:
+       WAICS: 85.1
+     CUTE:
+       WAICS: 87.8
+Bibtex: '@inproceedings{yu2020towards,
+  title={Towards accurate scene text recognition with semantic reasoning networks},
+  author={Yu, Deli and Li, Xuan and Zhang, Chengquan and Liu, Tao and Han, Junyu and Liu, Jingtuo and Ding, Errui},
+  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
+  pages={12113--12122},
+  year={2020}
+}'
diff --git a/paper_zoo/textrecog/What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.yaml b/paper_zoo/textrecog/What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.yaml
new file mode 100644
index 000000000..9318bc678
--- /dev/null
+++ b/paper_zoo/textrecog/What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis.yaml	
@@ -0,0 +1,71 @@
+Title: 'What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis'
+Abbreviation: Baek et al
+Tasks:
+ - TextRecog
+Venue: ICCV
+Year: 2019
+Lab/Company:
+ - Clova AI Research, NAVER/LINE Corp.
+URL:
+  Venue: 'http://openaccess.thecvf.com/content_ICCV_2019/html/Baek_What_Is_Wrong_With_Scene_Text_Recognition_Model_Comparisons_Dataset_ICCV_2019_paper.html'
+  Arxiv: 'https://arxiv.org/abs/1904.01906'
+Paper Reading URL: N/A
+Code: 'https://github.com/clovaai/deep-text-recognition-benchmark'
+Supported In MMOCR: N/S
+PaperType:
+ - Algorithm
+Abstract: 'Many new proposals for scene text recognition (STR) models have been
+introduced in recent years. While each claim to have pushed the boundary of the
+technology, a holistic and fair comparison has been largely missing in the field
+due to the inconsistent choices of training and evaluation datasets. This paper
+addresses this difficulty with three major contributions. First, we examine the
+inconsistencies of training and evaluation datasets, and the performance gap
+results from inconsistencies. Second, we introduce a unified four-stage STR
+framework that most existing STR models fit into. Using this framework allows
+for the extensive evaluation of previously proposed STR modules and the
+discovery of previously unexplored module combinations. Third, we analyze
+the module-wise contributions to performance in terms of accuracy, speed,
+and memory demand, under one consistent set of training and evaluation datasets.
+Such analyses clean up the hindrance on the current comparisons to understand
+the performance gain of the existing modules. Our code is publicly available.'
+MODELS:
+ Architecture:
+  - Attention
+  - CTC
+ Learning Method:
+  - Supervised
+ Language Modality:
+  - Implicit Language Model
+ Network Structure: 'https://user-images.githubusercontent.com/65173622/213169752-33203ec5-5602-44f0-8524-4ce77091dda8.png'
+ FPS:
+   DEVICE: N/A
+   ITEM: 35.3
+ FLOPS:
+   DEVICE: N/A
+   ITEM: N/A
+ PARAMS:  49.6M
+ Experiment:
+   Training DataSets:
+     - MJ
+     - ST
+   Test DataSets:
+     Avg.: 82.1
+     IIIT5K:
+       WAICS: 87.9
+     SVT:
+       WAICS: 87.5
+     IC13:
+       WAICS: 92.3
+     IC15:
+       WAICS: 71.8
+     SVTP:
+       WAICS: 79.2
+     CUTE:
+       WAICS: 74.0
+Bibtex: '@inproceedings{baek2019wrong,
+  title={What is wrong with scene text recognition model comparisons? dataset and model analysis},
+  author={Baek, Jeonghun and Kim, Geewook and Lee, Junyeop and Park, Sungrae and Han, Dongyoon and Yun, Sangdoo and Oh, Seong Joon and Lee, Hwalsuk},
+  booktitle={Proceedings of the IEEE/CVF international conference on computer vision},
+  pages={4715--4723},
+  year={2019}
+}'