Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection prefers to include dot's of i's underneath #1562

Open
rmast opened this issue Apr 22, 2024 · 0 comments
Open

Detection prefers to include dot's of i's underneath #1562

rmast opened this issue Apr 22, 2024 · 0 comments
Labels
type: bug Something isn't working

Comments

@rmast
Copy link

rmast commented Apr 22, 2024

Bug description

When I make the detector detect text in the following image
Brief gemeente 300dpi voorkant
the preferred dots in the boxes are from the i's in the lines below the boxes (see the error traceback picture)

Code snippet to reproduce the bug

import matplotlib.pyplot as plt
import matplotlib.patches as patches
from doctr.models import detection_predictor
from doctr.io import DocumentFile



def visualize_word_boxes(image_path, word_boxes):
    # Load the image
    image = plt.imread(image_path)

    # Get image dimensions
    image_height, image_width, _ = image.shape

    # Create figure and axes
    fig, ax = plt.subplots()
    ax.imshow(image)

    # Plot word boxes
    for box in word_boxes:
        # Convert normalized coordinates to absolute pixel values
        x1 = int(box[0] * image_width)
        y1 = int(box[1] * image_height)
        x2 = int(box[2] * image_width)
        y2 = int(box[3] * image_height)

        # Create a rectangle patch
        rect = patches.Rectangle((x1, y1), x2 - x1, y2 - y1, linewidth=1, edgecolor='r', facecolor='none')

        # Add the patch to the Axes
        ax.add_patch(rect)

    # Show the plot
    plt.show()

# Assuming 'doc' contains the loaded image and 'result' contains the word boxes
image_path = "/home/rmast/Downloads/Brief gemeente 300dpi voorkant.jpg"

# Assuming 'result' contains the detection results
model = detection_predictor(arch='db_resnet50', pretrained=True)
doc = DocumentFile.from_images("/home/rmast/Downloads/Brief gemeente 300dpi voorkant.jpg")
result = model(doc)
word_boxes = result[0]['words']  # Assuming 'words' contains the word boxes
visualize_word_boxes(image_path, word_boxes)

Error traceback

Detected boxes
See the box "Op meerdere plaatsen [op]" [op] contains a dot from below.
"kruisingen [op] het Kerkplein" This [op] also contains a dot from below.
"We [gaan] de kruisingen" This [gaan] also has a dot from below.
It appears the descenders of p and g increase the risk of this happening.

Environment

DocTR version: v0.8.1
TensorFlow version: N/A
PyTorch version: 2.2.2 (torchvision 0.17.2)
OpenCV version: 4.9.0
OS: Linux Mint 20.3
Python version: 3.12.3
Is CUDA available (TensorFlow): N/A
Is CUDA available (PyTorch): Yes
CUDA runtime version: 12.1.66
GPU models and configuration: GPU 0: NVIDIA GeForce GT 1030
Nvidia driver version: 535.86.05
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.3
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.3
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.3
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.3

Deep Learning backend

Python 3.12.3 | packaged by Anaconda, Inc. | (main, Apr 19 2024, 16:50:38) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

from doctr.file_utils import is_tf_available, is_torch_available

print(f"is_tf_available: {is_tf_available()}")
is_tf_available: False
print(f"is_torch_available: {is_torch_available()}")
is_torch_available: True

@rmast rmast added the type: bug Something isn't working label Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant