- Parkinsson Disease.csv : Dataset 1 with 22 features.
- pd_speech_features.csv : Dataset 2 with 753 features.
- Model_1st_Dataset.ipynb : Contains the unique bagging model trained on dataset 1 [Note: Used MRMR as feature selection ]
- Model_2nd_Dataset.ipynb : Contains the unique bagging model trained on dataset 2 [Note: Used MRMR as feature selection ]
- model_testing.ipynb : Contains the metrics of the classifier models, without FS, that are used in the bagging classifier.
- pearson_selection_resampling.ipynb : Contains the code of the new model [ New feature selection + sample resampling + new bagging model where the bags are based on the samples ]
- FS_classif.ipynb : Contains the metrics of the classifier models, with FS, that are used in the bagging classifier.
- audio_extractor.py : A Streamlit application which basically process the audio file, to extract the features, you upload. Check it out here
- Files 3 and 4 are for understanding the classifier model. Note that here the classification model includes feature resampling and mrmr as feature selection method.
- Currently the files 5,6,7 contains "pd_speech_features.csv" as the dataset.
- To use "Parkinson Disease.csv" simply replace the dataset and replace 'id' with 'name', 'class' with 'status'.
import numpy as np
import pandas as pd
df = pd.read_csv("Parkinsson disease.csv")
# Data Cleaning
df.drop('name', axis=1, inplace=True)
# Data preprocessing
X= df.drop('status', axis=1)
Y= df['status']
import numpy as np
import pandas as pd
df = pd.read_csv('pd_speech_features.csv')
# Data cleaning
df = df.drop('id', axis=1) # Remove the 'name' column
# Data preprocessing
X = df.drop('class', axis=1)
Y = df['class']