More Prebuilt Models
0.18 adds more prebuilt models into sadedegel library
News
-
Our main contributor @dafajon has implemented a new BM25Summarizer similary to TfIdf summarizer. BM25Summarizer outperforms slightly in short summaries.
-
We have packaged two new prebuilt models (Refer to README for model accuracies )
- tweeter profanity classification (
sadedegel.prebuilt.tweet_profanity
) - tweeter sentiment classification (
sadedegel.prebuilt.tweet_sentiment
)
- tweeter profanity classification (
-
Change the way we report summarizer performance. Instead of a grid search of summarizer options, we now use a RandomSearch to decide optimal summarizer and parameters. Refer to README for details.
Feature Drop & Deprecation
sents
property onDoc
is dropped. use__iter__(Doc)
instead.tf
property onDoc
is deprecated (will be dropped by 0.18) in favor ofget_tf
function which gives a more flexible way to access document level tf vectors.tfidf
function onDoc
is deprecated (will be dropped by 0.18) in favor ofget_tfidf
function which gives a more flexible way to access document level tf-idf vectors.lexrank
external dependency is dropped andLexRankPureSummarizer
is renamed to beLexRankSummarizer
set_config
,get_config
,describe_config
andget_all_configs
are dropped in favor of new configuration implementation.
Others
tf
property is now a part ofTfImpl
class using default configuration settings to yield a tf vector for aDoc
orSentence
- We've updated documentation for our datasets.
idf
property is now a part ofIdfImp
class using default configuration settings to yield a idf vector for aDoc
orSentence
- More default parameters in
default.ini
based on our summarizer performance.