General purpose Tutti crawler with optional pipeline posting to Slack when a new offer matching a searchterm gets published on Tutti.ch.
- Setup a new Scrapinghub project.
- Deploy the spider using
shub deploy
. - Optional: Set
SLACK_WEBHOOK
andSCRAPINGHUB_API_KEY
in the settings of your project to receive Slack notifications. - Run the spider with desired
searchterm
argument on Scrapinghub (manual or periodic).
Installation
python3 -m venv .venv
. ./.venv/bin/activate
pip install -r repository.txt
Add add an optional .env
file
# Optional: Slack Webhook to be called
# SLACK_WEBHOOK=https://hooks.slack.com/services/XXXXXXXX/XXXXXXXX/XXXXXXXX
# Optional: Scraping Hub Project & Key
# only make sense for development
# SCRAPINGHUB_API_KEY=xxx
# SCRAPY_PROJECT_ID=xxx
Running the spider to crawl for a searchterm
Example 1: Crawl the latest roomba
offers:
scrapy crawl tutti -a searchterm=roomba
Example 2: Crawl the latest 100 pages of all offers and dump results to a json:
scrapy crawl tutti -o offers.json -a pages=100