Fine-Tuning Orientation Predictor Models #1698

hienphan161 · 2024-08-20T05:24:20Z

hienphan161
Aug 20, 2024

Hi team. Can I fine-tune the crop orientation predictor or page orientation predictor model, and then use my fine-tuned model with the ocr_predictor? The current models for predicting orientation aren't working very well, which sometimes makes the OCR results really bad.

felixdittrich92 · 2024-08-20T07:07:01Z

felixdittrich92
Aug 20, 2024
Maintainer

Hi @hienphan161 👋,

The scripts to train / finetune the orientation models can be found here:
PyTorch
TF

types:
crop - Requires word crop image dataset (like the recognition training without annotations)
page - Requires full document image dataset (like the detection training without annotations)

Replacing with your custom model is unfortunately not really intuitive atm.
We need to think about a better interface without blowing up the ocr_predictor to much 😅

import torch

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from doctr.models.classification.zoo import OrientationPredictor
from doctr.models.preprocessor import PreProcessor

# Custom load and init predictors

custom_page_orientation_model = mobilenet_v3_small_page_orientation(pretrained=False)
page_params = torch.load(
    '/home/felix/.cache/doctr/models/mobilenet_v3_small_page_orientation-8e60325c.pt',
    map_location="cpu"
)
custom_page_orientation_model.load_state_dict(page_params)
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation(pretrained=False)
crop_params = torch.load(
    '/home/felix/.cache/doctr/models/mobilenet_v3_small_crop_orientation-f0847a18.pt',
    map_location="cpu"
)
custom_crop_orientation_model.load_state_dict(crop_params)

crop_predictor = OrientationPredictor(
        PreProcessor(
            custom_crop_orientation_model.cfg["input_shape"][-2:],
            preserve_aspect_ratio=True,
            symmetric_pad=True,
            batch_size=128,
        ),
        custom_crop_orientation_model
    )
page_predictor = OrientationPredictor(
        PreProcessor(
            custom_page_orientation_model.cfg["input_shape"][-2:],
            preserve_aspect_ratio=True,
            symmetric_pad=True,
            batch_size=3,
        ),
        custom_page_orientation_model
    )

# Normal inference

doc = DocumentFile.from_images("/home/felix/Desktop/doctr_test_data/3_90rot.jpg")


model = ocr_predictor(
    pretrained=True,
    assume_straight_pages=False,
    straighten_pages=True,
    detect_orientation=True,
).cuda().half()

# Overwrite the default orientation models
model.crop_orientation_predictor = crop_predictor
model.page_orientation_predictor = page_predictor

preds = model(doc)
preds.show()

0 replies

felixdittrich92 · 2024-08-20T07:13:40Z

felixdittrich92
Aug 20, 2024
Maintainer

A quick way to make it in a first step easier for users could look like:

import torch

from doctr.io import DocumentFile
from doctr.models import ocr_predictor, mobilenet_v3_small_page_orientation, mobilenet_v3_small_crop_orientation
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor

# Custom load and init predictors

custom_page_orientation_model = mobilenet_v3_small_page_orientation(pretrained=False)
page_params = torch.load(
    '/home/felix/.cache/doctr/models/mobilenet_v3_small_page_orientation-8e60325c.pt',
    map_location="cpu"
)
custom_page_orientation_model.load_state_dict(page_params)
custom_crop_orientation_model = mobilenet_v3_small_crop_orientation(pretrained=False)
crop_params = torch.load(
    '/home/felix/.cache/doctr/models/mobilenet_v3_small_crop_orientation-f0847a18.pt',
    map_location="cpu"
)
custom_crop_orientation_model.load_state_dict(crop_params)


doc = DocumentFile.from_images("/home/felix/Desktop/doctr_test_data/3_90rot.jpg")

model = ocr_predictor(
    pretrained=True,
    assume_straight_pages=False,
    straighten_pages=True,
    detect_orientation=True,
).cuda().half()

# Overwrite the default orientation models
model.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
model.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

preds = model(doc)
preds.show()

This would take only a small required change.

Wdyt ? Would this be helpful or still to complex ?

0 replies

hienphan161 · 2024-08-27T10:15:37Z

hienphan161
Aug 27, 2024
Author

The second one looks fine for me. Thanks. Another thing, can I turn off the crop orientation detection as sometimes it causes bad results? Also, do we have any benchmarks for these orientation detection models?

9 replies

hienphan161 Oct 4, 2024
Author

@felixdittrich92 So which dataset from the list of recognition datasets here did you use to train your page orientation detection model and crop orientation detection model?

felixdittrich92 Oct 4, 2024
Maintainer

Hi @hienphan161 👋,

The page orientation model was trained on the mindee internal dataset we use for detection model pretraining and the crop orientation model on the internal recognition dataset.

BTW: We added functionality to disable the specific orientation models (maybe helps in your case - only available on main branch yet)
See: https://mindee.github.io/doctr/latest/using_doctr/using_models.html#advanced-options

hienphan161 Oct 4, 2024
Author

Great. Thanks. Any plans on when to release a new version? Also, in case I want to fine tune the orientation models, how much data will I need?

hienphan161 Oct 7, 2024
Author

Hi @felixdittrich92, in case I want to fine tune the orientation models, how much data will I need?

felixT2K Oct 7, 2024

Hi @hienphan161 I think we will do the next release december :)
Depends a bit on the diversity the docs you need the model for .. but in general i think to start fine tuning a page_orientation_model ~100 images (all straight - it's auto-augmented in the pipeline) should be a good starting point :)

govindbagaria · 2024-10-29T09:43:40Z

govindbagaria
Oct 29, 2024

Hello @felixdittrich92

I'm currently integrating custom models into the ocr_predictor for various components including detection, recognition, and orientation. My setup involves loading custom-trained models for detection (**db_mobilenet_v3_large**) and recognition (**crnn_vgg16_bn**) as well as orientation models (**mobilenet_v3_small_page_orientation** and **mobilenet_v3_small_crop_orientation**) from specific paths.

Here's a snippet of my implementation:

from doctr.models import ocr_predictor, detection, recognition
from doctr.models.classification.zoo import crop_orientation_predictor, page_orientation_predictor
import torch

# Paths to the model weights
detection_model_path = 'path/to/detection_model.pt'
recognition_model_path = 'path/to/recognition_model.pt'
page_orientation_model_path = 'path/to/page_orientation_model.pt'
crop_orientation_model_path = 'path/to/crop_orientation_model.pt'

# Load detection model
det_model = detection.db_mobilenet_v3_large(pretrained=False, pretrained_backbone=False)
det_model.load_state_dict(torch.load(detection_model_path, map_location="cpu"))

# Load recognition model
reco_model = recognition.crnn_vgg16_bn(pretrained=False, pretrained_backbone=False)
reco_model.load_state_dict(torch.load(recognition_model_path, map_location="cpu"))

# Custom orientation models
custom_page_orientation_model = mobilenet_v3_small_page_orientation(pretrained=False)
page_params = torch.load(page_orientation_model_path, map_location="cpu")
custom_page_orientation_model.load_state_dict(page_params)

custom_crop_orientation_model = mobilenet_v3_small_crop_orientation(pretrained=False)
crop_params = torch.load(crop_orientation_model_path, map_location="cpu")
custom_crop_orientation_model.load_state_dict(crop_params)

# Initialize OCR predictor
model = ocr_predictor(det_arch=det_model, reco_arch=reco_model, assume_straight_pages=False, disable_crop_orientation=False)

# Assign custom orientation models
model.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
model.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

While the detection and recognition models load as expected without downloading the default models, the orientation models still trigger downloads when I instantiate the ocr_predictor. I've attempted to disable the default orientation models using the disable_crop_orientation=False parameter but it seems not to affect the downloads.

Could you please advise on how to properly configure the ocr_predictor so that it doesn't attempt to download the default orientation models? Is there a parameter or configuration step I am missing here?

Thank you in advance for your assistance!

2 replies

felixdittrich92 Oct 30, 2024
Maintainer

Hi @govindbagaria 👋,

Yeah i see the issue a quick and dirty trick to avoid the downloading:

# Initialize OCR predictor
model = ocr_predictor(det_arch=det_model, reco_arch=reco_model, assume_straight_pages=False, disable_crop_orientation=True, disable_page_orientation=True)

# Assign custom orientation models
model.crop_orientation_predictor = crop_orientation_predictor(custom_crop_orientation_model)
model.page_orientation_predictor = page_orientation_predictor(custom_page_orientation_model)

# Reenable orientation detection after loading custom models
model._page_orientation_disabled = False
model._crop_orientation_disabled = False

govindbagaria Oct 30, 2024

Thanks a lot @felixdittrich92. The above trick works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fine-Tuning Orientation Predictor Models #1698

{{title}}

Replies: 4 comments 11 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Fine-Tuning Orientation Predictor Models #1698

hienphan161 Aug 20, 2024

Replies: 4 comments · 11 replies

felixdittrich92 Aug 20, 2024 Maintainer

felixdittrich92 Aug 20, 2024 Maintainer

hienphan161 Aug 27, 2024 Author

hienphan161 Oct 4, 2024 Author

felixdittrich92 Oct 4, 2024 Maintainer

hienphan161 Oct 4, 2024 Author

hienphan161 Oct 7, 2024 Author

felixT2K Oct 7, 2024

govindbagaria Oct 29, 2024

felixdittrich92 Oct 30, 2024 Maintainer

govindbagaria Oct 30, 2024

hienphan161
Aug 20, 2024

Replies: 4 comments 11 replies

felixdittrich92
Aug 20, 2024
Maintainer

felixdittrich92
Aug 20, 2024
Maintainer

hienphan161
Aug 27, 2024
Author

hienphan161 Oct 4, 2024
Author

felixdittrich92 Oct 4, 2024
Maintainer

hienphan161 Oct 4, 2024
Author

hienphan161 Oct 7, 2024
Author

govindbagaria
Oct 29, 2024

felixdittrich92 Oct 30, 2024
Maintainer