Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to generate trie file? #4

Open
EuphoriaCelestial opened this issue Oct 23, 2020 · 4 comments
Open

How to generate trie file? #4

EuphoriaCelestial opened this issue Oct 23, 2020 · 4 comments

Comments

@EuphoriaCelestial
Copy link

Hi,
I have successful run all those steps in README and have bible.arpa bible.binary but there is no trie file
How can I generate trie? I cant find any tutorial about this

@kmario23
Copy link
Owner

Hey @EuphoriaCelestial,
trie is a data structure that's used when binarizing the model. Please have a look here for more info: kenlm/data-structures.

So, just using the trie switch should solve the issue.

@EuphoriaCelestial
Copy link
Author

Hey @EuphoriaCelestial,
trie is a data structure that's used when binarizing the model. Please have a look here for more info: kenlm/data-structures.

So, just using the trie switch should solve the issue.

I have tried this command kenlm/bin/build_binary -T /tmp/trie -S 1G trie bible.arpa bible.binary but get this error everytime

Reading bible.arpa
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
Segmentation fault (core dumped)

@kmario23
Copy link
Owner

This seems to be a recurring issue. C.f. kenlm/issues/248, /letter-based-language-model/33986

Some suggestions:

  • there's a discourse forum for DeepSpeech related issues to get help from.
  • recheck the (correct installation of all) dependencies. Or reinstall kenlm. Boost libs might cause issues.
  • Segmentation fault (core dumped) is a C/C++ issue. Seems to me that there's something wrong with the .arpa file.

@EuphoriaCelestial
Copy link
Author

This seems to be a recurring issue. C.f. kenlm/issues/248, /letter-based-language-model/33986

Some suggestions:

* there's a [discourse forum for DeepSpeech related issues](https://discourse.mozilla.org/c/mozilla-voice-stt/247) to get help from.

* recheck the (correct installation of all) dependencies. Or reinstall kenlm. Boost libs might cause issues.

* Segmentation fault (core dumped) is a C/C++ issue. Seems to me that there's something wrong with the `.arpa` file.

I have tried clean install on another machine with better specs (i7, 32gb RAM, 2080ti) but still got the same error
the .arpa file seem good ... I think so because I can use it to score sentences normally, it give the correct score with the example in README

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants