Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Larger wordlists #17

Open
6 of 7 tasks
Samyak2 opened this issue Apr 5, 2022 · 3 comments
Open
6 of 7 tasks

Larger wordlists #17

Samyak2 opened this issue Apr 5, 2022 · 3 comments
Labels
enhancement New feature or request words Word selection, word lists, language support, etc.

Comments

@Samyak2
Copy link
Owner

Samyak2 commented Apr 5, 2022

What and why?

Currently, the only built-in word list is the top 250 words list. This is very limiting as words will often repeat again in the same line and multiple times throughout a test.

It would be nice to have these word lists too:

How?

More info about the existing word list: https://docs.rs/toipe/latest/toipe/wordlists/constant.TOP_250.html

The word list needs to be added in this directory: https://github.com/Samyak2/toipe/tree/main/src/word_lists

and it needs to be listed here: https://github.com/Samyak2/toipe/blob/main/src/wordlists.rs

@Samyak2 Samyak2 added enhancement New feature or request words Word selection, word lists, language support, etc. labels Apr 5, 2022
Samyak2 added a commit that referenced this issue Apr 8, 2022
towards #17

also modified existing top250 list to remove non-alphabetic words (added
an extra word in place of the one removed)

word lists added:
- top500
- top1000
- top2500
- top5000

all of them are from the same source as the initial top250 list
@Samyak2
Copy link
Owner Author

Samyak2 commented Apr 8, 2022

The source I was using had only 5000 words (for free). Added 500-5000 words lists in 7c049c5, which is coming in v0.4.0.

@benliepert
Copy link
Contributor

I think the word list size is misleading - textgen.rs only looks at words between 2 and 8 characters. There are 927 words in the 5000 wordlist, for example, that didn't meet this criteria (925/927 were larger than 8 chars).
Maybe you could allow word size preference to be specified as a parameter (but default to between 2 and 8)?

@Samyak2
Copy link
Owner Author

Samyak2 commented Apr 13, 2022

I think the word list size is misleading - textgen.rs only looks at words between 2 and 8 characters. There are 927 words in the 5000 wordlist, for example, that didn't meet this criteria (925/927 were larger than 8 chars). Maybe you could allow word size preference to be specified as a parameter (but default to between 2 and 8)?

Good catch! The 2 to 8 chars filter was quite arbitrary. --min-length and --max-length flags to specify this would be nice, although that will require a bit of work to make the RawWordSelector store the ToipeConfig too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request words Word selection, word lists, language support, etc.
Projects
None yet
Development

No branches or pull requests

2 participants