This project aims to analyze product data from an online sports clothing company in order to provide recommendations for improving revenue. The dataset provided includes five tables: info
, finance
, reviews
, traffic
, and brands
. The primary key for all tables is product_id
. Below is a brief description of each table:
product_name
: Name of the productproduct_id
: Unique ID for productdescription
: Description of the product
product_id
: Unique ID for productlisting_price
: Listing price for productsale_price
: Price of the product when on salediscount
: Discount, as a decimal, applied to the sale pricerevenue
: Amount of revenue generated by each product, in US dollars
product_name
: Name of the productproduct_id
: Unique ID for productrating
: Product rating, scored from 1.0 to 5.0reviews
: Number of reviews for the product
product_id
: Unique ID for productlast_visited
: Date and time the product was last viewed on the website
product_id
: Unique ID for productbrand
: Brand of the product
In this notebook, we first count the number of missing values in the dataset. We use a SQL query to join the info
, finance
, and traffic
tables and count the non-missing entries for the description
, listing_price
, and last_visited
columns. The result shows the total number of rows and the counts for each column.
Next, we analyze the pricing of Nike and Adidas products in the dataset. We run a SQL query to select the brand, listing_price as an integer, and the count of all products in the finance
table. We join the brands
table to the finance
table on the product_id
and filter for products with a listing_price greater than zero. The results are grouped by brand and listing_price, and sorted by listing_price in descending order.
To further analyze the pricing data, we assign labels to different price ranges and group the results by brand and label. We also include the total revenue for each price range and brand. We use a SQL query to select the brand, count of products, total revenue, and create four labels based on the price range. The results are grouped by brand and price_category and sorted by total_revenue.
Please refer to the notebook for the detailed SQL queries and their results.