A fun projects made using Scrapy. The Spiders included in this are able to extract Movie, TV-Series, TV-Movies based on year and title type. A lot more to come features ahead
Python3
>> python3 -m venv venv
>> . ./venv/bin/activate
Anaconda
>> conda create --name venv
>> conda activate venv
- Scrapy
IMDb Scraper extracts the following attributes from IMDb websites. Also, have a look at an examplary json and CSV file extracted by IMDb Scraper.
- Movie Name
- Movie ID
- Movie URL
- Poster
- Year
- Genre
- RunTime
- Certificate
- Rating
- MetaScore
- Plot
- Votes
- Gross
- Director
- Director ID
- Director URL
- Cast
- Cast ID
- Cast URL
Use the package manager pip to install following
>> pip install -r requirements.txt
>> pip install scrapy
Anaconda
>> conda install scrapy -y
- feature
- tv_series
- tv_movie
- tv_episode
- tv_special
- tv_miniseries
- documentary
- video_game
- short
- video
- tv_short
>> scrapy crawl imdb_year -a title_type=feature -a year=2019
Save the output as a file
>> scrapy crawl imdb_year -a title_type=feature -a year=2019 -o output.csv
>> scrapy crawl imdb_year -a title_type=feature -a year=2019 -o output.json