Kaggle Competition | Porto Seguro’s Safe Driver Prediction
- Porto Seguro’s Safe Driver Analysis
- Deleted picture 1 in Scatterplot ('ps_reg_01', 'ps_reg_02', 'ps_reg_03')
- Deleted picture 2 in Scatterplot('ps_calc_01', 'ps_calc_02', 'ps_calc_03')
- In this competition, you will predict the probability that an auto insurance policy holder files a claim.
In the train and test data, features that belong to similar groupings are tagged as such in the feature names (e.g., ind, reg, car, calc). In addition, feature names include the postfix bin to indicate binary features and cat to indicate categorical features. Features without these designations are either continuous or ordinal. Values of -1 indicate that the feature was missing from the observation. The target columns signifies whether or not a claim was filed for that policy holder.
It is a competition that can be said to be Kaggle's introductory period and conducts a Python-based analysis.
[My focusing was on]
- EDA - Focusing on dependent variable
- Data_type finding & division
- Missing Data arrangement & deletion
- VarianceThreshold - finding varience & balance
- Correlation & Density Analysis
- One - Hot - Encoding
- Feature engineering(Address, Datetime)
- sparse matrix(csr_matrix)
- Boosting Model Selection(Catboost) 10.Metrics - model_selection(Normalized Gini Coefficient)
[Dependencies & Tech]:
- IPython
- NumPy
- Pandas
- SciKit-Learn
- SciPy
- Seaborn
- Matplotlib
- Plotly
- Folium
- StatsModels
- LightGBM
- Catboost
Nothing ruins the thrill of buying a brand new car more quickly than seeing your new insurance bill. The sting’s even more painful when you know you’re a good driver. It doesn’t seem fair that you have to pay so much if you’ve been cautious on the road for years.
Porto Seguro, one of Brazil’s largest auto and homeowner insurance companies, completely agrees. Inaccuracies in car insurance company’s claim predictions raise the cost of insurance for good drivers and reduce the price for bad ones.
In this competition, you’re challenged to build a model that predicts the probability that a driver will initiate an auto insurance claim in the next year. While Porto Seguro has used machine learning for the past 20 years, they’re looking to Kaggle’s machine learning community to explore new, more powerful methods. A more accurate prediction will allow them to further tailor their prices, and hopefully make auto insurance coverage more accessible to more drivers.