Welcome to our Crop Yield Prediction Regression project. This is where we're using the power of machine learning to predict crop yields for 10 of the most consumed crops worldwide.
We've designed and implemented a crop yield prediction model using various ML approaches. We've used a bunch of ML algorithms including Random-Forest Regressor, Gradient-Boosting Regressor, Decision-Tree Regressor, and Support-Vector Regressor.
We've gone through a few steps to get to our results:
-
Gathering and Cleaning Data - We've removed unwanted columns from the CSV files, dropped the null valued rows, and merged the yield CSV file with the pesticides and average rainfall CSV files to form a combined dataframe with all the required attributes.
-
Data Exploration - We've used visual exploration to understand the dataset and see the correlation between the different attributes.
-
Data Preprocessing - We've encoded categorical variables, scaled the features, and split the data into a 70:30 train-test ratio.
-
Model Training, Comparison, and Selection - We've trained our model using Gradient Boosting Regressor, Random Forest Regressor, SVM, and Decision Tree Regressor. We've used the R^2 (coefficient of determination) regression score function for evaluation. The Decision Tree Regressor gave us the best accuracy of 96.04%.
-
Model Results & Conclusions - We've visualized the importance of the top 7 features in determining the crop yield, and viewed the yield predicted for the top 10 most consumed crops. We've also used the Graphviz Library to see the decision tree created by the model.
- NumPy
- Pandas
- Matplotlib
- Scikit-learn
- Seaborn
- Pydot
- Graphviz
You can find the notebook for this project here.
We've used data from FAO and World Bank.
So, that's about it! If you like what you see, give us a star ⭐. Thanks and happy coding! 🚀