This project was an extension of my project I worked on as a Data Challenge Finalist @ Meta. The dataset I was provided with was a great resource, but I wanted to see what else I could do and learn along the way while working on it.
While my project @ Meta was a simple EDA to make a content pitch for a fictious entertainment platform called "Zuckflix", this project is a deeper look into the dataset (with a helping hand from some other datasets) from a perspective of gender representation on the big screen and uses more complex analysis methods such as hypothsis testing and simple NLP methods.
From my personal experience of feeling lost about cleaning datasets to start personal projects, I wrote a Medium article explaining my rigorous methodology that I have also applied to this project. Given the analysis I've done to this dataset, I'm currently finalizing a Medium article called "Exploring Gender Representation in Films w/ Simple NLP + Hypothesis Testing".
This project has taught me about NLP, an area I've become more interested in, and also more complex hypothesis testing methods beyond the usual taught in introductory statistics classes.
Here's my data cleaning article: https://towardsdatascience.com/how-to-clean-your-data-in-python-8f178638b98d