Dataset not well formated? #1737
-
I try to train a recognition model with
It gives me this error:
Any idea? |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 6 replies
-
Seems like I have badly formatted the labels.json.
My original dataset is in PAGE-XML format that has tuples of polygon coordinates.
I made wrote a code to extract the polygons and convert to DocTr labels. Should I do anything else to the original polygons? |
Beta Was this translation helpful? Give feedback.
-
This is how I "convert" my tuples to doctr polygons, is that wrong?
|
Beta Was this translation helpful? Give feedback.
-
I think you was right, the code below I think should do it:
Output: Anyway, my original dataset had something like this: Should this still doing in training a |
Beta Was this translation helpful? Give feedback.
-
Oh... that explains many things. Unfortunately almost all my datasets are line level segmentation/detection and recognition |
Beta Was this translation helpful? Give feedback.
Hi @johnlockejrr 👋,
The values are already absolute so that's fine 👍
But what you have is a multi-point polygon and doctr requires a 4-point polygon as label :)
So what you have simply to do is to extract the top-left, top-right, bottom-right and bottom-left points from each polygon label to get
Then the
polygons
key in thelabels.json
contains all the annotations for 1 image: