Skip to content
This repository has been archived by the owner on Jun 11, 2020. It is now read-only.

How can I train this model with arabic or urdu characters? #34

Open
ghulammustufa31 opened this issue Feb 15, 2019 · 1 comment
Open

Comments

@ghulammustufa31
Copy link

My labels contain arabic/urdu text.
For example "اسلام آباد : چیئرمین رضابانی کی زیر صدارت سینیٹ کا اجلاس"

What changes are required to train the model given non-English labels?

@Belval
Copy link
Owner

Belval commented Feb 15, 2019

So according to britannica, Arabic has 28 letters which means that it would be more compatible with the CRNN architecture than a word-based language like Chinese. I think that you can expect reasonable results by simply replacing the values in CRNN/config.py and expect somewhat workable results. Since Arabic is read right to left, you might encounter some issue but you'll have to try to be sure.

Now for Urdu, the same process can be applied, but some characters seem to be very wide. Since CRNN is not attention based this could make it very hard to converge.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants