Russian Words Clusters offers a way to cluster russian words by criterias (by a common stem, by the closeness of vowels or consonants).
For now it supports verbs but was not built for clusterings words that may have different suffixes, as would be a noun and an adjective of a same stem.
It offers options:
- to merge clusters built on different criterias
- to input words pairs, useful for clustering verbs and their aspects
The clustering algorithm can be used either with the CLI or with the class's methods.
Content of file1
:
выстрелить
отличать
застрелить
отличить
python3.8 clustering.py --input file1 --criterias STEM TRANS --merge
выстрелить
застрелить
отличать
отличить
The CLI clustering.py
offers the possibility to cluster words and words pairs.
Classes in clustering.py
can be called to cluster words. As for usage examples, you can refer to the code in the Main part of clustering.py
, or to the code contained in the tests
folder.
Project has a pip package: https://pypi.org/project/russian-words-clusters/
Once the package installed you can import classes into your Python code using from russianwords.clustering import *
.
Content of file2
:
посещать/посетить
разделять
разбираться/разобраться
выделиться/выделить
изменять/изменить
выделяться/выделять
python3.8 clustering.py --input file2 --criterias STEM TRANS --merge --are-pairs
посещать/посетить
разделять
выделяться/выделять
выделиться/выделить
разбираться/разобраться
изменять/изменить