One Big Release
Pre-release
Pre-release
In one month time we have added lots into sadedegel library.
News
- We have @doruktiktiklar as the first code contributor out of Global Maksimum AI team.
New Capabilities
- ADD: Addition of Vocabulary and Token concepts into library
Token
: singleton per word (case sensitive) to store unique token features (lower form, shape, document frequency, etc.)- New
sadedegel-build-vocabulary
to manage sadedegel vocabularies.
New Summarizers
- ADD: TextRank Summarizer
TextRank summarizer uses Google's PageRank algorithm based on distance/similarity defined by BERT embedding cosine distance/similarity (as of this release and more to come) - ADD: TFIDF Summarizer
TFIDF Summarizer uses element sum of tfidf vector of a sentence as the relevance score of a sentence in a document.
Others
- UPDATE: Some annotator consensus issues on summary corpus.
- UPDATE: A better command-line for summarizer evaluation. Check
sadedegel-summarize evaluate
for more - ADD: Sentences level
tf
,idf
andtfidf
embeddings - ADD:
Doc
hastfidf_embeddings
property similar tobert_embeddings
property.
Documentation
- ADD: Youtube webinar videos (in Turkish) on sadedeGel YouTube Channel
Contribution Guidelines
- ADD: Commit Guidelines
- ADD: New Feature checklist
Feature Drop & Deprecation
-
DROP: Code quality guidelines is removed since Code Inspector limits the number of lines per open source project. We might continue with other providers later in the future.
-
DEPRECATED:
Doc.sents
will be removed by version0.17
- Use
[i]
to access ith sentences of a document Doc
object now implements__iter__
to let iterate over all sentences of a document.
- Use
Bugfix
- Properly handle empty documents. Ex
Doc("")
orDoc('')