An e-commerce retail company suspects that there is a decline in customers purchasing product on their portal. The company collects customer information as a part of the sign-up process. The historical data about customers and sales is available from 2010 to 2014.
Senior leadership has asked the analytics team to validate the suspicion using data. In addition, the leadership wants to leverage historical data to perform demographic analysis. The outcome of the demographic analysis will be utilized by the marketing team to perform segmentation and targeting with greater efficacy.
For demography analysis, following data points are needed.
- Customer’s address
- Average customer spend
- Age
- Gender
- Marital Status
- Annual income Since customer information is sensitive, the data resides in company’s secured SQL server. The analytics team has been given a read access to this SQL database to perform the analysis. It is known that information collected when a customer signs up is available in the dimension table named “dbo.DimCustomer”. The shopping related information is available in the fact table named “dbo.FactInternetSales”
Apart from the pre-installed Microsoft Office Suite, we need to install 2 important softwares.
-
Power BI Desktop is used which is an open source Data Visualization software created by Microsoft as part of the Microsoft Business Intelligence Toolkit.
1.1 Download Power BI from here.
1.2 Step by step installation guide is available. -
Microsft SQL Server, a relational database management system developed by Microsoft is used for Data Storage, Retrieval and data transformation.
2.1 Download the SQL Server 2019 Developer version
2.2 SQL Server 2019 Developer Step by step installation guide. -
Microsoft SQL Server Management Studio, a software application first launched with Microsoft SQL
3.1 Download SQL Server Management Studio 18.10
3.2 SQL Server Management Studio - Step by step installation guide.
SELECT [CustomerKey]
,[FirstName]
,[LastName]
,[BirthDate]
,[MaritalStatus]
,[Gender]
,[YearlyIncome]
,[AddressLine1]
,[DateFirstPurchase]
FROM [AdventureWorksDW2019].[dbo].[DimCustomer]
SELECT
[CustomerKey]
,[OrderQuantity]
,[UnitPrice]
,[DiscountAmount]
,[CustomerPONumber]
,[OrderDate]
FROM [AdventureWorksDW2019].[dbo].[FactInternetSales]
2.1 Data type change : It can be seen that the data type of “BirthDate” column in Customer table is Date\Time
For the analysis, time component is not required. Using Transform option, the data type is changed from “Date\time” to “Date”
2.2 Turn on Column Quality and Column Distribution : Column Quality and Column Distribution options are enabled from View section. These options help in scanning the data for any errors or blank values
2.3 Removing null values : The CustomerPONumber column from Sales table has null values. Therefore, it is not going to be useful in the analysis and is deleted.
2.4 Adding a custom column : A column that contains Sales Amount is needed for the analysis. Sales amount can be calculated using Unit Price, Order Quantity and applying Discounts. The same is formulized to create the custom column as shown below. Data type of SalesAmount is changed from “Any” to “Decimal” using Transform option
In order to apply slicer across years, months and days a calendar table is needed. DAX provides a function called “Calendar” which needs start date and end date as input. To create the table with dynamic start and end date, following DAX code is used.
Calendar_Table =
//Calculate Start Date
var start_date = MIN(Sales_Table[OrderDate])
//Calculate End Date
var end_date = MAX(Sales_Table[OrderDate])
RETURN
// Create a dynamic calendar table
CALENDAR ( start_date, end_date)
In addition, new columns are added which represent year, month, week and day. This is to provide ability to slice the data across multiple periods.
To be able to use all three tables, relationships must be defined across them. Customer and Sales table are related by CustomerKey column. Sales and Calendar Table are related by Date and OrderDate columns. The final data model is as shown below.
1. Using Maps visual to show customer’s location : "AddressLine1" column is used as Location. SalesAmount is added in Bubble size section to show size of purchase made relative to other regions.
2. Area chart visual for average customer trend : To calculate Average Customer Spend, number of unique customers is needed. A measure in Sales Table is created using following code and then displayed using area chart.
Average Spend Per Customer =
var number_of_unique_customers = DISTINCTCOUNT(Sales_Table[CustomerKey])
RETURN DIVIDE(SUM(Sales_Table[SalesAmount]), number_of_unique_customers,0)
3. Customer’s age at time of purchase : A new column is added in Sales table to represent age of the customer. The age is calculated as difference between birth date and date of first purchase. DAX function DATEDIFF is used.
Card visual is added which displays the calculated column.
4. Gender Ratio : A new measure in Sales table is used to calculate gender ratio. It calculates ratio of males to entire data set as shown below. The measure is then displayed as a card.
5. Marital status : Similar to Gender ratio, the marital status percentage is calculated and displayed.
6. Average Annual Income : New column is created in Sales table which refers the average income column in customer table. Card visual is used to display.
- Based on the historical data analyzed, it is conclusive that on average the customers are spending lesser on the company’s portal. The average spend per customer has reduced from $3102 in 2010 to $55 in 2014, a decrease of more than 98%
- In addition, the demographic profile revealed that the average customer is a 43 years old male who is married and earns around $60,000 annually. These data points are shared with marketing team for their action.