-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
deu_latf wordfile #383
Comments
The corresponding training data is available at https://github.com/tesseract-ocr/langdata_lstm/tree/main/deu_latf For the basic meaning of the files, see https://groups.google.com/g/tesseract-ocr/c/U9mysQuhRpU/m/7aNrZACXBQAJ for example. |
Don't use deu_latf for Fraktur. Try https://zenodo.org/records/10125246 instead. More models here: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/. |
See also https://ocr-bw.bib.uni-mannheim.de/faq/ (German). |
Dann bedanke ich mich recht herzlich, Herr Weil und wünsche weiterhin viel Erfolg mit ihrem Programm! ;) |
I tried to recognize some old Fraktur texts with deu_latf, but there are many words that are not recognized correctly, so I extracted the word list from deu_latf. This file seems to use word recognition
Example: A-{d}-{cd°s}%-
A-{d}-{cd°a}%
A-{d}-{c-%
A-{d}s§gi
I then extracted the readable version and realized that a lot more words (recognitions) could be added. I would also like to try to improve the problem with the recognition of "ich, schon ,noch" etc. to improve it. Because, with "bat ned) " (hat noch) "bod)" (doch) you can not do much.
Is there a README file for this file or another explanation to extend it?
The text was updated successfully, but these errors were encountered: