-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need Mongolian traineddata #85
Comments
Does any one got update to train mongolian Language ? |
There are some repositories on GitHub: khangaikh/tesseract-mon, dolugen/tesseract-mnc, maybe more. But there seems to be code missing in Tesseract for Mongolian, see ccmain/pageiterator.cpp. |
http://www.alanwood.net/unicode/mongolian.html
Both of these are for Mongolian-Cyrillic Tesseract repos also have mon.traineddata - not sure whether it is cyrillic or otherwise. https://github.com/tesseract-ocr/tessdata_fast/blob/master/mon.traineddata https://github.com/tesseract-ocr/tessdata_best/blob/master/mon.traineddata |
I checked the wordlist from mon.traineddata. Here is a sample from it:
So it looks like, it is Mongolian-Cyrillic.
ref: https://en.wikipedia.org/wiki/Mongolian_writing_systems Were you looking for Mongolian-Cyrillic or the traditional Mongolian traineddata? |
Mongolian, written in Mongolian script is written vertically from left to right. https://github.com/tesseract-ocr/tesseract/blob/master/ccmain/pageiterator.cpp#L543 seems related to that. However, the mon.traineddata which is Mongolian in Cyrrilic, does not require it. Here is sample of wordlist for Mongolian, written in Mongolian script taken from http://crubadan.org/languages/mn-Mong
|
Related Info: http://scriptsource.org/cms/scripts/page.php?item_id=script_detail&key=Mong https://www.ethnologue.com/language/mvf https://groups.google.com/forum/#!msg/tesseract-ocr/EjnYPwmx7UM/lmzi37oKjQsJ http://www.babelstone.co.uk/Mongolian/Report170.pdf https://r12a.github.io/mongolian-variants/ |
@Shreeshrii @stweil Hi guys, Thanks for your replies !As you mentioned @Shreeshrii , I am not either sure about tessdata_best mon. tranineddata file has trained traditional or Cyrillic.
I found the trained version mismatched with the tesseract engine version. Which is a different issue then what we are taking here. So i made it working on Cyrillic text data when I trained with @Shreeshrii i will update the traineddata file with wordlist too. @Skeetfly for lpr, you can apply regex to the recognised result from tesseract. I Hope it's useful ... |
Is there any progress in the work on traditional Mongolian? |
I don't know of anyone who works on it. |
I'm thinking about using tesseract on lpr how good is it?
The text was updated successfully, but these errors were encountered: