This project is an in-depth analysis and machine learning modeling exercise for the Kaggle competition "House Price Prediction." The goal is to predict residential home prices in Ames, Iowa, using various explanatory variables.
- Data Exploration: Analysis of the distribution of data, identification of missing values, and understanding relationships between features and the target variable.
- Data Preprocessing: Handling missing values, feature engineering, and data scaling/transformation.
- Model Selection: Evaluation of multiple machine learning models to identify the most effective ones.
- Hyperparameter Tuning: Optimization of the chosen models for improved performance.
- Model Evaluation: Use of cross-validation and other techniques to assess model performance.
- Prediction and Submission: Generating predictions for the test dataset and preparing a submission for the Kaggle competition.
- Data Wrangling and Exploration: Pandas, NumPy, Scipy
- Visualization: Matplotlib, Seaborn
- Machine Learning: Scikit-learn, XGBoost, LightGBM, CatBoost
The dataset comprises 79 explanatory variables detailing various aspects of residential homes. More details can be found in the dataset description.
- Perform a thorough exploratory data analysis.
- Prepare the data for machine learning modeling.
- Build and tune ensemble model to predict house prices accurately.
- Assess model performance with suitable metrics.
- Create a final prediction for competition submission.