Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deu_latf wordfile #383

Open
Stond0cyborg opened this issue Apr 1, 2024 · 4 comments
Open

deu_latf wordfile #383

Stond0cyborg opened this issue Apr 1, 2024 · 4 comments

Comments

@Stond0cyborg
Copy link

I tried to recognize some old Fraktur texts with deu_latf, but there are many words that are not recognized correctly, so I extracted the word list from deu_latf. This file seems to use word recognition
Example: A-{d}-{cd°s}%-
A-{d}-{cd°a}%
A-{d}-{c-%
A-{d}s§gi
I then extracted the readable version and realized that a lot more words (recognitions) could be added. I would also like to try to improve the problem with the recognition of "ich, schon ,noch" etc. to improve it. Because, with "bat ned) " (hat noch) "bod)" (doch) you can not do much.

Is there a README file for this file or another explanation to extend it?

@stefan6419846
Copy link
Contributor

The corresponding training data is available at https://github.com/tesseract-ocr/langdata_lstm/tree/main/deu_latf For the basic meaning of the files, see https://groups.google.com/g/tesseract-ocr/c/U9mysQuhRpU/m/7aNrZACXBQAJ for example.

@stweil
Copy link
Collaborator

stweil commented Apr 1, 2024

Don't use deu_latf for Fraktur. Try https://zenodo.org/records/10125246 instead.

More models here: https://ub-backup.bib.uni-mannheim.de/~stweil/tesstrain/.
My latest models for historic texts are called "german_print*".

@stweil
Copy link
Collaborator

stweil commented Apr 1, 2024

See also https://ocr-bw.bib.uni-mannheim.de/faq/ (German).

@Stond0cyborg
Copy link
Author

Dann bedanke ich mich recht herzlich, Herr Weil und wünsche weiterhin viel Erfolg mit ihrem Programm! ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants