In this machine learning project, we cleaned, analyzed, and predicted two target variables – both revenue (numerical) and profitability (categorical), from the dataset of * THE MOVIE DATABASE (TMDb). *
To explore the various Data Processing, Analysis and Regression and Classification Modeling techniques required for our Dataset to provide better predictions of the Revenue or the Profitability of a movie before its production.
The dataset contains around 5000 movies with 22 features and is obtained from Kaggle [1]. The information available about each movie include its budget, revenue generated, genres, rating, vote count, popularity, actors and actresses and any more. However, we used an unclean version of the dataset for our project.
In this project, we will use this dataset to clean, analyze and determine whether any information about a movie can predict the total revenue of a movie. We will then attempt to predict whether a movie's revenue will exceed its budget (profitability). Also, we will test two different models for each prediction to check which predicts our target variable better.
The results obtained from this project will be helpful for the Movie Production Teams to analyze the rubrics of their Movie Idea before it moves on to the Production Phase.
For detailed information, head over to: