This project consists of a code was developed using the Selenium, a web-based application automation framework through browsers.
The code leaves the search page of news sites with a filter for a period of time, filling the search field with the keywords. Like result, relevant news about the keywords is shown in a period of time. For each news item, the algorithm collects the title, description, date and full news URL. After collecting all this information, the algorithm enters each stored URL and collects the news content.
- Selenium is a portable framework for testing web applications. Selenium provides a reproduction tool to create functional tests without the need to learn a test scripting language.
*Disclaimer: The use of this library/software in the wrong way is the sole responsibility of the user. This code was developed for academic projects and approved by the sites that are receiving data collection.
All methods are in the process of being built since the moment I write this.
The repo is structured like a package, so it can be installed from pip using github clone url. From command line type:
pip install git+https://github.com/luizeduardomr/ScrapingNews.v3
To upgrade the package if you have already installed it:
pip install git+https://github.com/luizeduardomr/ScrapingNews.v3h --upgrade
Please note that you should also install Google Chrome browser in order to use this software better