This project implements a machine learning algorithm based on Zdzislaw Pawlak's Rough Set Theory to predict golf performance based on weather conditions.
The project consists of the following files:
Train_data_golf_14ex.csv
: Training dataset.Test_data_golf_50ex.csv
: Test dataset.algorithm.py
: The main script with the implementation of the algorithm.
- Clone the repository:
git clone https://github.com/your-username/rst-golf-prediction.git
- Go to your project folder:
cd rst-golf-prediction
- Install required dependencies:
pip install pandas
- Place your CSV data files in your project root folder.
- For correct operation specify the path to the test and training dataset depending on its location on your computer
df_path = 'Put your personal path here'
df_test_path = 'Put your personal path here too'
- Run the script
RS-ML.py
python RS-ML.py
Outlook | Humidity % | Wind | Play |
---|---|---|---|
Overcast | 87 | Fasle | Yes |
Sunny | 80 | True | Yes |
Sunny | 80 | True | Yes |
Overcast | 75 | True | Yes |
Overcast | 75 | True | Yes |
Rainy | 80 | False | No |
Sunny | 80 | True | No |
Rainy | 80 | False | No |
Rainy | 85 | False | No |
Overcast | 87 | False | Yes |
After launch we get the following intermediate results, which represent the construction of production rules:
Getting an elementary subsets of dataset:
[[0, 9], [1, 2, 6], [3, 4], [5, 7], [8]]
[[0, 9], [3, 4]]
======== Production rules for positive region ========
1) IF (Outlook = Overcast)& (Humidity% = 87 & 75)& (Wind = False & True)& THEN DECISION "PLAY" = PLAY
======== Production rules for negative region ========
2) IF (Outlook = Rainy)&(Humidity% = 85 V 80)&(Wind = False) THEN DECISION "PLAY" = DON'T PLAY
======== Production rules for boundry region ========
3) IF (Outlook = Sunny)&(Humidity% = 80)&(Wind = True) THEN DECISION "PLAY" = MAYBE PLAY
Approximation accuracy: 0.571
The final result will be the classification of the test dataset based on the constructed rules, as well as a comparison of the classification of the algorithm with the true values.
Outlook | Humidity % | Wind | Play | Classification |
---|---|---|---|---|
Overcast | 87 | Fasle | Yes | Yes |
Sunny | 80 | True | Yes | Maybe |
Rainy | 80 | True | Yes | Unknown |
Sunny | 75 | True | Yes | Maybe |
NaN | 75 | True | Yes | Unknown |
Overcast | 80 | False | No | Yes |
Raqiny | 80 | True | No | No |
Accuracy of the classification RS1: 42.9 %
The main implemented functions of the algorithm are:
get_elementary_subsets(X)
: A function that returns elementary subsets of a set of objects.get_lower(elementary, X_true_indexes)
: Formation of lower approximation.get_upper(elementary, X_true_indexes)
: Formation of upper approximation.get_pos_rule(pos_dataframe)
: Creating production rules for upper approximation.get_neg_rule(not_pos_dataframe)
: Creating production rules for lower approximation.get_maybe_rule(maybe_dataframe)
: Creating production rules for boundry region.classify_new_data(row, pos_df, maybe_df, neg_df)
: Classification of a test data set based on constructed rules.