Release Direction to General Purpose NLP Library for Turkish · GlobalMaksimum/sadedegel

0.17 release introduces several non summarisation related NLP capabilities in Sadegel

News

Starting with this release, sadedegel now ships prebuilt models for various basic NLP tasks. The purpose is to allow developers to load & use those models with minimal configuration.
- Our first model is a news classifier (Thanks Taner Sezer for his corpus support)
We report accuracy of our tokenizers (word) for potential enhancement points in future releases (Thanks Taner Sezer for his corpus support)
To support the development of prebuilt models, sklearn compatiblle extension.sklearn module is introduced for feature engineering
Token.is_stopwordis added to flag stopword token types.
LexRankSummarizer (based on lexrank external module, to be deprecate in future releases) and LexRankPureSummarizer (pure sadedegel version of the same method) is added into set of extractive summarizers.

sents property on Doc is dropped. use __iter__(Doc) instead.
tf property on Doc is deprecated (will be dropped by 0.18) in favor of get_tf function which gives a more flexible way to access document level tf vectors.
tfidf function on Doc is deprecated (will be dropped by 0.18) in favor of get_tfidf function which gives a more flexible way to access document level tf-idf vectors.

We have pushed up TF and IDF implementations from Sentence and Doc to separate classes using python multiple inheritance support to reduce code duplication.