Fine-Tuning Orientation Predictor Models #1698
Replies: 4 comments 11 replies
-
Hi @hienphan161 👋, The scripts to train / finetune the orientation models can be found here: types: Replacing with your custom model is unfortunately not really intuitive atm.
|
Beta Was this translation helpful? Give feedback.
-
A quick way to make it in a first step easier for users could look like:
This would take only a small required change. Wdyt ? Would this be helpful or still to complex ? |
Beta Was this translation helpful? Give feedback.
-
The second one looks fine for me. Thanks. Another thing, can I turn off the crop orientation detection as sometimes it causes bad results? Also, do we have any benchmarks for these orientation detection models? |
Beta Was this translation helpful? Give feedback.
-
Hello @felixdittrich92 I'm currently integrating custom models into the ocr_predictor for various components including detection, recognition, and orientation. My setup involves loading custom-trained models for detection ( Here's a snippet of my implementation: from doctr.models import ocr_predictor, detection, recognition
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor
import torch
# Paths to the model weights
detection_model_path = 'path/to/detection_model.pt'
recognition_model_path = 'path/to/recognition_model.pt'
page_orientation_model_path = 'path/to/page_orientation_model.pt'
crop_orientation_model_path = 'path/to/crop_orientation_model.pt'
# Load detection model
det_model = detection.db_mobilenet_v3_large(pretrained=False, pretrained_backbone=False)
det_model.load_state_dict(torch.load(detection_model_path, map_location="cpu"))
# Load recognition model
reco_model = recognition.crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
reco_model.load_state_dict(torch.load(recognition_model_path, map_location="cpu"))
# Custom orientation models
custom_page_orientation_model = mobilenet_v3_small_page_orientation(pretrained=False)
page_params = torch.load(page_orientation_model_path, map_location="cpu")
custom_page_orientation_model.load_state_dict(page_params)
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation(pretrained=False)
crop_params = torch.load(crop_orientation_model_path, map_location="cpu")
custom_crop_orientation_model.load_state_dict(crop_params)
# Initialize OCR predictor
model = ocr_predictor(det_arch=det_model, reco_arch=reco_model, assume_straight_pages=False, disable_crop_orientation=False)
# Assign custom orientation models
model.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
model.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model) While the detection and recognition models load as expected without downloading the default models, the orientation models still trigger downloads when I instantiate the Could you please advise on how to properly configure the Thank you in advance for your assistance! |
Beta Was this translation helpful? Give feedback.
-
Hi team. Can I fine-tune the crop orientation predictor or page orientation predictor model, and then use my fine-tuned model with the
ocr_predictor
? The current models for predicting orientation aren't working very well, which sometimes makes the OCR results really bad.Beta Was this translation helpful? Give feedback.
All reactions