-
Download and install Python
32-bit Windows:
https://www.python.org/ftp/python/3.11.0/python-3.11.0.exe64-bit Windows:
https://www.python.org/ftp/python/3.11.0/python-3.11.0-amd64.exeMake sure to check the "Add python.exe to PATH" at the bottom of the installation window.
Open a command prompt and navigate to the scraper's directory.
The easiest way is to open the scraper's folder and type cmd in the address bar.
Type the following command, you have to do this only once:
pip install -r requirements.txt
You can start the scraper by typing the following command:
python main.py
The website doesn't have any anti-scraping protection.
However, if you wish, you can add a rotating proxy.
In LongueuilQuebecScraper/settings.py set HTTPPROXY_ENABLED = True and HTTP_PROXY = 'http://username:password@host:port'
Caching is enabled, allowing you to run the scraper over multiple sessions without redownloading the same html pages.
However, it does not check for duplicates, so remember to delete the data.csv file each time you run it.