Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate jmnedict database #3

Open
tshatrov opened this issue Jun 22, 2015 · 2 comments
Open

Incorporate jmnedict database #3

tshatrov opened this issue Jun 22, 2015 · 2 comments
Assignees

Comments

@tshatrov
Copy link
Owner

Lately the few placenames etc. that exist in jmdict are being moved to jmnedict. If this continues, ichi.moe won't be able to recognize stuff like Tokyo etc., which is unacceptable. We need to incorporate jmnedict names without messing up the segmenting algorithm. Kanji names should be top priority, katakana names are not important and can be ignored for now. They should score lower than regular words so as not to pollute the results.

@tshatrov tshatrov self-assigned this Jun 22, 2015
@buster-blue
Copy link

buster-blue commented Oct 20, 2020

Any updates on this? I don't know much about databases, but I feel like this wouldn't be too hard to do and it would make the parser much more useful, since it wouldn't just break whenever it came across proper nouns anymore. I'm just curious because the issue is still open, but it's from 5 years ago. If you've just been too busy, that's fine, or maybe it's harder to do than I thought.

@tshatrov
Copy link
Owner Author

I decided not to do this because it would likely degrade segmenting a lot. Proper nouns can't be consistently romanized anyway. I'll be adding things that can be romanized such as place names separately. For example I already added all municipalities that currently exist in Japan. I'll be looking for other databases that I can incorporate without breaking too much stuff. But regarding jmnedict integration by all means, pull requests are welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants