Poor performance of fine-Tuned OCR recognition model #1782
Unanswered
stevemanavalan
asked this question in
Q&A
Replies: 1 comment 3 replies
-
Hey @stevemanavalan 👋, You used the main SynthTiger repo to generate the dataset ? Maybe give it a try: branch: Generated dataset can be downloaded here And the model i fine tuned with it: https://huggingface.co/Felix92/doctr-torch-parseq-multilingual-v1 Best, PS: the font arg has no effect if you provide a train and val path :) font is only required if you use the integrated WordGenerator |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am trying to fine-tune the OCR recognition model (crnn_vgg16_bn) and i have already looked at discussions from #1677 and #1366. I have tried the following:
However the model performs poorly when i set scale as 1 in
DocumentFile.from_pdf(bytes_pdf, scale=1)
during inference, when compared with https://huggingface.co/tilman-rassy/doctr-crnn-vgg16-bn-fascan-v1 trained on ~100K samples. However when the scale is set to 2 during inference, both the models performs in a comparable manner. I have also tried reducing the quality of images and also adding noise to the dataset generated using synthtiger with no visible improvements to the recognition model.Can someone please assist me in improving my recognition model performance?
Training args:
python references/recognition/train_pytorch.py crnn_vgg16_bn --train_path "train_set" --val_path "val_set" --epochs 100 --vocab german --name doctr_crnn_vgg16_bn --pretrained --b 400 --wb --font "1942.ttf,FreeSans.ttf,LiberationMono-BoldItalic.ttf,LiberationMono-Italic.ttf,rm_typerighter.ttf,FreeMono.ttf,FreeSerif.ttf,LiberationMono-Bold.ttf,LiberationMono-Regular.ttf"
Beta Was this translation helpful? Give feedback.
All reactions