Project Description

Please download the report for a thorough explanation of this project. Below you will find the analysis steps and links to the files for each step.

Project Description

It is interesting to see novels being adapted to films. Our question is whether the science fiction novels’ ratings correlate to ratings of films. Also, is there a correlation between science fiction ratings or film ratings with revenue obtained from a film?

Analysis Steps

Here are the steps that were taken and some of the problems we found:

Extract data

Get the list of science fiction books adapted into films
- Purnima scrapped Wikipedia to get the list of science fiction books adapted into films.
  - Code: web_scrapping/Wikipedia_scrapping-Copy1.ipynb
  - Output: input_csv/newmovielist.csv
Obtain book ratings
- Purnima scrapped GoodReads, using the list from step 1 as search queries, to get the user ratings of these adapted science fiction books.
  - Code: web_scrapping/goodread_scrape-Copy1.ipynb
  - Input: input_csv/newmovielist.csv
  - Output: input_csv/merged_list.csv
- Tigran attempted to obtain book ratings from Amazon Books, but was being blocked after a few queries regardless of which IP address and location he'd try from.
- Naim attempted to obtain ratings from Chapters Indigo only to find that the book ratings are not authorized/scrappable
  - Code: additional/Scrape_Chapters_Indigo.ipynb
  - Input: input_csv/booklist_ratings.csv
Obtain film ratings
- Callan queried the list from step 1 to the OMDb API to extract movie ratings and their revenue
  - Code: API_manipulation/OMDB_API.ipynb
  - Input: input_csv/newmovielist.csv
  - Output: Transformed_data/movieListDB.csv

Transform

Callan merged the book ratings with the results from the OMDb queries to get a combined dataset with book and movie titles and their corresponding ratings, and movie revenues
- Code: API_manipulation/OMDB_API.ipynb
- Inputs: input_csv/merged_list.csv and Transformed_data/movieListDB.csv
- Output: Transformed_data/CombinedDF.csv and a cleaner Transformed_data/bookListDB.csv
We did not have the input_csv/merged_list.csv at the beginning, so Naim merged book and movie titles by similarity from an older version of Transformed_data/CombinedDF.csv. This worked well, but these results won't be used as it's best to use the queried title strings merged with their corresponding movie titles
- Code: additional/Merge_book_movie_titles_by_similarity.ipynb
- Inputs: input_csv/booklist_ratings.csv and Transformed_data/CombinedDF.csv
- Output: Transformed_data/merged_book_and_movie_titles.csv

Load

Tigran loaded the Transformed_data/CombinedDF.csv file to plot the relationships below:
- Code: Plots/plot.ipynb
- Input: Transformed_data/CombinedDF.csv
- Output:
Naim has written a script (Loading_into_MongoDB/MongoDump.ipynb) to enable dumping of the books and films information extracted into MongoDB database called adapted_scifi_films_db, creating a books and movies collection
Furthermore, once the database is created, Loading_into_MongoDB/MongoLoad.ipynb enables loading into pandas dataframes, and a quick inner join creates the CombinedDF dataframe between movies and books
- Code: Loading_into_MongoDB/MongoDump.ipynb and Loading_into_MongoDB/MongoLoad.ipynb
- Input: Transformed_data/bookListDB.csv and Transformed_data/movieListDB.csv
- Output: adapted_scifi_films_db MongoDB database with books and movies collections

Extra

We looked into whether our GoodReads ratings that we web-scrapped were similar to the ones from which Kaggle processed about a year ago. And this is indeed the case, the ratings did not change much (see additional/Kaggle_merge_with_Adapted_MoviesList.ipynb)
- Plots: kaggle_vs_our_ratings.png and kaggle_vs_our_ratings_diff.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Description

Analysis Steps

Extract data

Transform

Load

Extra

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
API_manipulation		API_manipulation
Loading_into_MongoDB		Loading_into_MongoDB
Plots		Plots
Transformed_data		Transformed_data
additional		additional
input_csv		input_csv
web_scrapping		web_scrapping
.gitignore		.gitignore
Proposal.docx		Proposal.docx
README.md		README.md
Report.docx		Report.docx

naim-panjwani/books_and_films

Folders and files

Latest commit

History

Repository files navigation

Project Description

Analysis Steps

Extract data

Transform

Load

Extra

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages