Replies: 1 comment
-
Looks like that's more a discussion topic |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
🚀 The feature
Hello all,
It is very likely that the data you use to train text recognition models has errors that when training on it prevent higher model performance. I know that the data on which you train the models is private and well protected. That's why I would like to offer you a script that would correct these errors by yourself. In view of the amount of important data you have at your disposal to train your models, this would even achieve higher performance than what the benchmarks of the models were made.
The principle is based on two things: the predictions of the models with their associated labels. To put it simply, if we order the predictions in descending order of score predictions and observe in the same order the labels considered incorrect, this would highlight the poorly labeled data that have the most negative impact on model learning.
So, if you agree it would be with great pleasure to work on this script to help you get better quality data.
Motivation, pitch
correction of labels in data allows better models performances
Alternatives
No response
Additional context
No response
Beta Was this translation helpful? Give feedback.
All reactions