WINTER OF CODE 2020 PROJECT IDEAS FOR rake_new2 : Read all project ideas here
---- ALL STUDENTS FOR THIS SEASON OF WoC HAVE BEEN SELECTED ALREADY. NO LONGER ACCEPTING PROPOSALS. ---- Head over to DevScript Winter of Code logo if you want to contribute.
--- AND ---
WINTER OF CODE 2020 PROJECT IDEAS FOR rake_new2 : See all updated and open Issues here. Comment to get assigned. Assignment is based on FCFS(First Come, First Serve) basis or based on better proposal/idea/worklflow.
rake_new2 is a Python library that enables simple and fast keyword extraction from any text. This library helps beginners or those lost while finding keywords, understand which keywords are more important.
HOW IS THIS DIFFERENT FROM ANY OTHER ALGORITHM ? : This library gives you weights/scores along with each keyword/keyphrase. This helps you pick out the correct key-phrases. Just choose the ones with more weights.
┐(︶▽︶)┌
--> Psst! I tested it myself by writing my project abtract keywords using my library. The teacher has approved, so yeah, this works. (ʘ ͜ʖ ʘ)
-
Handles repetitive keywords/key-phrases
-
Handles consecutive punctuations.
-
Handles HTML tags in text : The user is allowed an option to choose if they want to keep HTML tags as keywords too.
Use the package manager pip to install rake_new2.
pip install rake_new2
from rake_new2 import Rake
text = "Red apples are good in taste."
text2 = "<h1> Hello world !</h1>"
rk,rk_new1,rk_new2 = Rake(),Rake(keep_html_tags=True),Rake(keep_html_tags=False)
# Case 1
# Initialize
rk.get_keywords_from_raw_text(text)
kw_s = rk.get_keywords_with_scores()
# Returns keywords with degree scores : {(1.0, 'taste'), (1.0, 'good'), (4.0, 'red apples')}
kw = rk.get_ranked_keywords()
# Returns keywords only : ['red apples', 'taste', 'good']
f = rk.get_word_freq()
# Returns word frequencies as a Counter object : {'red': 1, 'apples': 1, 'good': 1, 'taste': 1}
deg = rk.get_kw_degree()
# Returns word degrees as defaultdict object : {'red': 2.0, 'apples': 2.0, 'good': 1.0, 'taste': 1.0}
# Case 2 : Sample case for testing the 'keep_html_tags' parameter. Default = False
print("\nORIGINAL TEXT : {}".format(text))
# Sub Case 1 : Keeping the HTMLtags
rk_new1.get_keywords_from_raw_text(text2)
kw_s1 = rk_new1.get_keywords_with_scores()
kw1 = rk_new1.get_ranked_keywords()
print("Keeping the tags : ",kw1)
# Sub Case 2 : Eliminating the HTML tags
rk_new2.get_keywords_from_raw_text(text2)
kw_s2 = rk_new2.get_keywords_with_scores()
kw2 = rk_new2.get_ranked_keywords()
print("Eliminating the tags : ",kw2)
'''OUTPUT >>
ORIGINAL TEXT : <h1> Hello world !</h1>
Keeping the tags : {'h1', 'hello'}
Eliminating the tags : {'hello world'}
'''
You might come across a stopwords error.
It implies that you do not have the stopwords corpus downloaded from NLTK.
To download it, use the command below.
python -c "import nltk; nltk.download('stopwords')"
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update tests as appropriate.
Sankha Subhra Mondal
Student Name | GitHub ID | Merged PR No. | Open source programme name | If DWOC, level of PR |
---|---|---|---|---|
Sabarish Rajamohan | sabarish98 | #16 | Hacktoberfest | -- |
Soham Kar | 2bit-hack | #20 | Hacktoberfest | -- |
Jawen Voon | jawsvk | #26 | Hacktoberfest | -- |
Ananthakrishnan Nair RS | akrish4 | #47 | DWOC | Level-1 |
Tushar Nankani | tusharnankani | #43 | DWOC | Level-3 |