Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

POS tags ? distinguishing some patterns #85

Open
eroux opened this issue Nov 14, 2021 · 2 comments
Open

POS tags ? distinguishing some patterns #85

eroux opened this issue Nov 14, 2021 · 2 comments

Comments

@eroux
Copy link
Contributor

eroux commented Nov 14, 2021

In a use case of phonetics I need to distinguish the sound of (ba or wa), but this seems currently impossible with botok:

  • རབ་གསལ་བས is tokenized as རབ་གསལ་ - བས (in that case བས is pronounced )
  • བྱང་ཆུབ་བར་དུ is tokenized as བྱང་ཆུབ་ - བར་ - དུ (in that case བར is pronounced bar)

is there any way I discriminate between the two with botok (or any other tool)?

@ngawangtrinley
Copy link
Contributor

བར་དུ་ should be added to the vocab. I would argue that it's a frozen expression by now. We'll add instructions on how to do this in the botok docs

@eroux
Copy link
Contributor Author

eroux commented Nov 15, 2021

well, what I'll do with another POS tagger is to look at the n.rel tag of https://web.archive.org/web/20170824153724/http://larkpie.net/tibetancorpus/tags

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants