The purpose of this project is to visualise observations made through analysing scraped tweet level data from 2008-2020 for Barack Obama, Joe Biden and Donald Trump. The 'Exploratory_analysis_twitter_obama_trump_biden.ipynb' details how I was able to generate a WordCloud fitted to the shape of Obama, Trump and Biden
- Data Visualization
- WordCloud generator
- Web Scraping
- Python
- GetOldTweets3
- PIL
- json
- wordcloud
- pandas
- numpy
- matplotlib
- seaborn
- glob
- csv
- time
- re
- datetime
The project is broken down into 7 key sections within the "Exploratory_analysis_twitter_obama_trump_biden.ipynb" workbook:
- Scraping tweet level data from Obama, Trump and Biden's Twitter feeds via the "GetOldTweet3" library. Given the limitations of only being able to obtain up to 3,200 tweets via basic Twitter API access. Working with the 'GetOldTweets3' library is a useful hack for scraping an inifinite amount of tweets as we're able to obtain the neccessary tweet data through web scraping the twitter user feeds versus accessing through an API connection.
- Assessing the data in order to identify what cleaning steps are required.
- Cleaning the dataset in order to make it fit for conducting exploratory analysis.
- Exploratory analysis into the dataset and uncover learnings. This is where I produce a WordCloud fitted to the shape of Obama, Trump and Biden
- Export cleaned DataFrame to a GoogleSheet.
- Export DataFrame to csv for DataStudio usage.
- Export original cleaned Dataset.
Will need to pip install the following libraries in order to scrape tweets and generate a wordcloud.
- Web scraping, cleaning and exploratory analysis: "Exploratory_analysis_twitter_obama_trump_biden.ipynb"
- Explanatory analysis (Jupyter Notebook and HTML Slidedeck) : "Explanatory_analysis_twitter_obama _trump_biden.ipynb", "Explanatory_analysis_twitter_obama _trump_biden.slides.html"
- Clean csv file for Jupyter Notebook usage: "biden_trump_obama_clean_2008_2020_original"
- Clean csv file for Google Sheets usage: "biden_trump_obama_clean_2008_2020_gspread"
- Clean csv file for DataStudio usage: "biden_trump_obama_clean_2008_2020_datastudio"
Following blog served extremely useful in providing an overview on how to extrapolate tweet level data working with the 'GetOldTweets3' library