Regular Expression based Simple Word Tokenizer & Code Quality
Pre-release
Pre-release
- ADD: Major change of this release is Simple word tokenizer implementation by @dafajon after seeing the issues with BERT Tokenizer. Note that simple tokenizer is still experimental and not compatible with all summarizers (Cluster based summarizer automatically switch to BERT Tokenizer in order to be able to utilize BERT embeddings)
- ADD: Introduction of
sadedgel.set_config
to modify some sadedegel configurations. Such as word tokenizer. - ADD:
tags
are added toExtractiveSummarizer
in order to filter them out (in evaluation etc.) easily. - ADD: Thanks to Code Inspector
sadedeGel
is under constant code quality monitoring with an intial grade of A (Score 94). We will keep it high as much as we can as the capabilities of the library grows. - CHANGE: Downgrade sklearn dependency back to
0.23.1
to prevent serialization compatibility warnings. - CHANGE: Score normalization of summarizers push up to parent abstract class
ExtractiveSummarizer
, improving code quality by reducing repetitive code blocks.