A simple understanding of this project:
I was looking for a way to extract data from a particular page, Canada's 🍁 Express Entry Draws.
So I could build a simple Forecast System, but the draws are already at #215
(At time of writing this ReadMe), which is a lot of data to copy and paste in a short period of time.
So I came up with this solution A simple WebScrapper, which allow us to:
- Make Screenshots of the pages
- Export data to a .txt file as well as CSV file or Excel
- Save time, retrieving data from the website
Todo:
Add Docker and Start the Chromium window as Headless
- Head up to settings.json, this should be the output of the file
{
"add_screenShots": false,
"export_to_excel": false
}
"add_screenShots" -> This is responsible for taking page screenshots
"export_to_excel" -> This is responsible to insert data in a Excel Spreadsheet
- Enable or Disable the options you consider useful in your case scenario
- Navigate to the project directory:
$ cd scrapper # Navigate to project's directory
$ npm install # This installs all project's dependencies
$ npm start # This starts the scrapper
Using npm start is going to start scrapping the webpage, a chromium window should pop up, in newer versions the process will be totally headless
- ExcelJs
- Puppetteer
- Docker [Coming soon in the next versions]
If you need any help send me an email. Use the subject as: "Express Entry Scrapper"
Made with <3 by Ruben Costa