This project is going to analyze the employee_attrition.csv, which contains the detailed information of each employee, such as age, department, education, whether they will stay at the company (attrition), etc. The data is provided by the professor who taught me the IST707 Applied Machine Learning at Syracuse University.
The analysis process is as follows:
- Data cleaning, pre-processing: Judge the necessity of variables, deal NAs and outlier with appropriate methods.
- Exploratory data analysis (EDA): Descriptive statistics and apply data visualization to check for interesting data patterns.
- Run association rule mining algorithm using default settings as a baseline model.**
- Fine tune the model by experimenting with different algorithm hyper-parameters and discuss how tuning those hyper-parameters could impact - - the model performance (e.g. overfitting or underfiting).
- Output and present the top 5 rules which predict those who stay vs. who leave.
- Provide interpretations of the above chosen association rules and also discuss why you consider them interesting and significant.
Please use the below link to view the notebook if you are receiving the 'Sorry, soemthing went wrong. Reload?' error message. https://nbviewer.jupyter.org/