Data analytics project for Career Foundry. Complete analysis for Instacart, an online grocery store app, in order to increase sales.
Instacart requested an initial and exploratory analysis based on customer purchasing behaviors. The purpose of the analysis is to inform a marketting strategy to target certain customer profiles with appropropriate products.
The Instacart data included data sets on orders, orders_products_prior, products, customers, and departments
- Data Dictionary here
- Career Foundry Data Set here
- Instacart Online Grocery Shopping Dataset 2017 here
*Note: Career Foundry project request was to create a summary report in excel form. In practice this would be in the form of a powerpoint and/or Tableau or PowerBI dashboard.
- Summary Report and Visualizations here
The analysis was stored in a file containing the following folders.
- 01 Project Management: Contains the Instacart Project Brief
- 02 Data: Separated into Original and Prepared Data. These contain the original data frames and the data frames after they have been cleaned and prepared for analysis respectively. NOTE: This folder has not been included
- 03 Scripts: Contains all the Python coding involved for the entire analysis process. The scripts are organized based on the flow of the Career Foundry exercises.
- 04 Analysis: Contains the visualizations derived from Python analysis and used for developing insights within the final report
- 05 Sent to client: Contains the Final Report in Excel *As noted above, the Excel format was required by Career Foundry. All visualizations and insights could be presented as a PowerPoint or dashboard as another method
- Python coding: consistency checks, transforming data, groupby() function, aggregations, creating new variables, exclusion flags, subsetting, data frame merges, merge flags, user-defined functions, loc function, for loops, crosstabs, random sampling, visualization
- Visualizations: Utilized Matplotlib.pyplot, seaborn, and scipy to create bar plots, histograms, and scatterplots in python
- Analytical Process: Population flow, data cleaning within exploratory data analysis (EDA)- data frame shape, data types, outliers, duplicates, missing values consistency checks, preparation for merging data frames; subsetting data, Excel reporting, project organization
- Business Skills: demographic profiling, data security (PII specific), data ethics, marketing segmentation, summary reports, translating complex data into marketting and sales insights
While “married” adults make up the highest customer base, it is important to note that family status data does not represent an accurate picture of customer demographics. The options for family status did not include: people living with roommates, alternative family structures, or partners who are unmarried living together. This also means that the sample is more representative of straight nuclear family structures compared to queer families and is not equitable. It is also not possible to source full-accurate relationship or living status data. For these reasons, focusing on known attributes, such as age, is more accurate.