This was a group project where we are comparing the effectiveness of supervised learning using various multivariate data sets and i was involved doing so using Random Forest Model. I implemented the feature importance of various predictor variables and how it effects the error rate(RMSE). I used the Student Performance Dataset to show how the importance of various predictor variables. I implemented it in Python using various libraries like Numpy, Scipy, Scikit-learn, pandas, matplotlib and seaborn packages for plotting the figures.
Datasets used: 1. Wine Quality http://archive.ics.uci.edu/ml/datasets/Wine+Quality 2. Student Performance http://archive.ics.uci.edu/ml/datasets/Student+Performance 3. Adult Dataset https://archive.ics.uci.edu/ml/datasets/Adult 4. http://archive.ics.uci.edu/ml/datasets/forest+fires
We also used the Gaussian mixture model GMM Sampling algorithm to create sampling data of various dataset mentioned above and use on the model implemented and test its results.