Web Scraping and NLP with Requests, BeautifulSoup, and spaCy

Kellie Heckman Module 6 Web scraping Web mining and NLP Northwest Missouri State University

This module uses spaCy and BeautifulSoup to tokenize and lemmatize an article that was retrieved from a URL and the HTML saved to a .pkl file

Web Scraping and NLP with Requests, BeautifulSoup, and spaCy

Complete the tasks in the Python Notebook in this repository. Make sure to add and push the pkl or text file of your scraped html (this is specified in the notebook)

Rubric

(Question 1) Article html stored in separate file that is committed and pushed: 1 pt
(Question 2) Article text is correct: 1 pt
(Question 3) Correct (or equivalent in the case of multiple tokens with same frequency) tokens printed: 1 pt
(Question 4) Correct (or equivalent in the case of multiple lemmas with same frequency) lemmas printed: 1 pt
(Question 5) Correct scores for first sentence printed: 2 pts (1 / function)
(Question 6) Histogram shown with appropriate labelling: 1 pt
(Question 7) Histogram shown with appropriate labelling: 1 pt
(Question 8) Thoughtful answer provided: 1 pt

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
README.md		README.md
page.pkl		page.pkl
web-scraping.html		web-scraping.html
web-scraping.ipynb		web-scraping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping and NLP with Requests, BeautifulSoup, and spaCy

Rubric

About

Releases

Packages

Languages

krh5284/web-scraping

Folders and files

Latest commit

History

Repository files navigation

Web Scraping and NLP with Requests, BeautifulSoup, and spaCy

Rubric

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages