Direction to General Purpose NLP Library for Turkish
0.17 release introduces several non summarisation related NLP capabilities in Sadegel
News
- Starting with this release, sadedegel now ships prebuilt models for various basic NLP tasks. The purpose is to allow developers to load & use those models with minimal configuration.
- Our first model is a news classifier (Thanks Taner Sezer for his corpus support)
- We report accuracy of our tokenizers (word) for potential enhancement points in future releases (Thanks Taner Sezer for his corpus support)
- To support the development of prebuilt models,
sklearn
compatiblleextension.sklearn
module is introduced for feature engineering Token.is_stopword
is added to flag stopword token types.LexRankSummarizer
(based on lexrank external module, to be deprecate in future releases) andLexRankPureSummarizer
(pure sadedegel version of the same method) is added into set of extractive summarizers.
Feature Drop & Deprecation
sents
property onDoc
is dropped. use__iter__(Doc)
instead.tf
property onDoc
is deprecated (will be dropped by 0.18) in favor ofget_tf
function which gives a more flexible way to access document level tf vectors.tfidf
function onDoc
is deprecated (will be dropped by 0.18) in favor ofget_tfidf
function which gives a more flexible way to access document level tf-idf vectors.
Others
- We have pushed up TF and IDF implementations from Sentence and Doc to separate classes using python multiple inheritance support to reduce code duplication.