Minor Performance Enhancements & Tidy Up
Pre-release
Pre-release
In one month time we have added lots into sadedegel library.
News
- We have resolved an old and major issue caused by improper
from transformers import AutoTokenizer
calls here and there and lazy loading sentence boundary detector (sbd). Just to given an idea:sadedegel config
CLI call to show sadedegel configuration took 11 sec in 0.16.1.1 release whereas 2 sec in 0.16.2.1+from sadedegel import Doc
call (which is usually the first one to start working with sadedegel) took 9.5 sec in 0.16.1.1 release whereas 1 sec in 0.16.2.1+
Feature Drop & Deprecation
- Old configuration capabilities are deprecated (this time unfortunately without prior warnings in earlier releases)
DeprecationWarning
is the indication that you do access one of such APIs which will completely be removed by0.18
- Please use new API
config_context
(tf_context
andidf_context
are just simplified wrappers)
Documentation
- CONFIG.md details the configuration of sadedegel.
Others
__getitem__
function to access any token of aSentence
- Iterator on
Sentence
yields allToken
s in order. - default tf method is now
log_norm
instead ofbinary
thanks to @dafajon's most recent summarizer experiments.