This project focuses on analyzing game-related user data to evaluate the effectiveness of different onboarding variants using A/B testing and the Chi-Square test. The goal is to determine whether variant changes influence user engagement and purchases. For this analysis, I utilized SQL to gather insights and perform statistical tests.
- Imported CSV files into SQL tables using the
CREATE TABLE
statement and loaded data using the file paths. - The dataset consists of artificially generated data using ChatGPT, simulating user behaviors and actions in a game environment.
- Rows: 100,172
- Columns: 4 (
exp_name
,user_id
,exp_date
,variant
) - Details:
exp_name
: Contains a single experiment labeled "Onboarding".user_id
: Contains 100,172 distinct values.exp_date
: Ranges from 2020-01-01 to 2020-02-19, with 50 unique dates.variant
: Two groups - Variant 1 and Control Group.
- Rows: 124,185
- Columns: 3 (
user_id
,action
,action_date
) - Details:
action
: Includes "onboarding complete" and "email_optin".user_id
: 87,124 distinct values.action_date
: Ranges from 2020-01-01 to 2020-02-19, with 50 unique dates.
- Rows: 12,897
- Columns: 3 (
user_id
,purch_date
,amount
) - Details:
user_id
: 9,969 distinct values.purch_date
: Ranges from 2020-01-01 to 2020-03-11, with 70 unique dates.amount
: Contains 6 distinct purchase values: 2.99, 10, 25, 50, 100, 999.
- Conducted a Chi-Square test online to compare the success rates between Variant 1 and the Control Group.
- Result: Variant 1 was significantly more successful.
- Used SQL functions (
AVG
,STDDEV
) to compute mean and standard deviation for further statistical analysis.
- Recalculated the Chi-Square test with SQL-derived metrics.
- Result: No significant difference between Variant 1 and Control after recalculating.
- Joined the
game_actions
table to assess the total number of users who completed onboarding. - Result: The null hypothesis was rejected, indicating a significant difference between the two variants.
- Redefined the success metric as user purchases and reran the analysis.
- Result: Significant difference between the two variants based on purchase behavior.
- Performed an analysis to evaluate differences over a 7-day interval.
- Result: No significant difference between the variants in a week-by-week comparison.
- Analyzed differences between two distinct time periods.
- Result: Significant differences observed, but reliability may be affected by external factors.
- Compared user behavior between Canada and the USA.
- Result: Differences were noted, but conclusions may be unreliable due to varying population sizes and demographic factors.
- Variant 1 initially appeared to be more effective than the Control group based on onboarding actions.
- However, when shifting the metric to actual purchases, the difference between the variants became less clear.
- The time-based analysis revealed fluctuations that could be due to external factors, making it necessary to control for variables like seasonality or promotional events.
- Country-specific analyses suggest that demographic factors may influence game engagement and purchases.
- SQL: Data extraction, transformation, and analysis.