Welcome to the Income Predictor repository! This project uses machine learning to predict income based on various features from the dataset. We have utilized Random Forest and Gradient Boosting algorithms to achieve this.
This project takes a dataset and preprocesses it to convert categorical data to numerical data. After splitting the data into training and testing sets, we train a Random Forest model and a Gradient Boosting model. The accuracy of the Random Forest model is 0.86.
- Clone the repository:
git clone https://github.com/Armanx200/Income-Predictor.git
- Navigate to the project directory:
cd Income-Predictor
- Install the required libraries:
pip install -r requirements.txt
- Run the predictor:
python Income_Predictor.py
Income_Predictor.py
: Main script for training and evaluating the models.adult.csv
: The dataset used for training.requirements.txt
: List of required libraries for the project.Figure.png
: Plot showing feature importances of the model.
- Data Preprocessing: Handles missing values and encodes categorical variables.
- Model Training: Trains both Random Forest and Gradient Boosting models.
- Hyperparameter Tuning: Uses GridSearchCV for finding the best hyperparameters.
- Model Evaluation: Provides accuracy, classification report, and confusion matrix.
- Random Forest Classifier
- Gradient Boosting Classifier
- Random Forest:
- Accuracy: 0.86
- Detailed classification report and confusion matrix available in the output.
- Gradient Boosting:
- Try running the script to check the performance metrics.
- Add more models to compare.
- Perform more extensive hyperparameter tuning.
- Implement advanced feature engineering techniques.
For any questions or suggestions, feel free to reach out:
- GitHub: Armanx200
Made with β€οΈ by Armanx200