Cereberal-Stroke-Analysis

# Followed Process

Read Data:

The script starts by importing necessary libraries (pandas, numpy, seaborn, matplotlib.pyplot) and reading a CSV file into a DataFrame (df).

Exploratory Data Analysis (EDA):

Basic exploration of the dataset using head(), describe(), and checking for missing values using isnull().sum().

Handling Categorical Variables:

One-hot encoding is performed on categorical variables using pd.get_dummies().

Handling Missing Values:

Missing values are imputed using the k-nearest neighbors algorithm (KNNImputer from sklearn.impute).

Feature Scaling and Train-Test Split:

Features are scaled using MinMaxScaler, and the dataset is split into training and testing sets.

Model Selection:

Several classification models are chosen (KNeighborsClassifier, GaussianNB, DecisionTreeClassifier, and RandomForestClassifier) for initial testing.

Model Evaluation Without Resampling:

Classification reports are generated for each model to evaluate their performance on the imbalanced dataset.

OverSampling (SMOTE):

The script uses the Synthetic Minority Over-sampling Technique (SMOTE) to oversample the minority class.

Model Evaluation After OverSampling:

The same models are re-trained and evaluated on the oversampled dataset.

UnderSampling:

Random under-sampling is performed to balance the class distribution.

Model Evaluation After UnderSampling:

The models are re-trained and evaluated on the undersampled dataset.

Combining OverSampling and UnderSampling (SMOTEENN):

The SMOTEENN technique, which combines SMOTE and Edited Nearest Neighbours (ENN), is applied.

Model Evaluation After Combining OverSampling and UnderSampling:

The models are re-trained and evaluated on the combined dataset.

Conclusion:

The script provides classification reports for each model after different resampling techniques.
It highlights that resampling techniques, particularly SMOTEENN, improve the model's ability to identify cases positive for stroke.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Images		Images
Cereberal Stroke-Imbalance Dataset Handeling.ipynb		Cereberal Stroke-Imbalance Dataset Handeling.ipynb
Cereberal Stroke-Imbalance Dataset Handeling.py		Cereberal Stroke-Imbalance Dataset Handeling.py
Cereberal_Dataset.csv		Cereberal_Dataset.csv
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cereberal-Stroke-Analysis

Read Data:

Exploratory Data Analysis (EDA):

Handling Categorical Variables:

Handling Missing Values:

Feature Scaling and Train-Test Split:

Model Selection:

Model Evaluation Without Resampling:

OverSampling (SMOTE):

Model Evaluation After OverSampling:

UnderSampling:

Model Evaluation After UnderSampling:

Combining OverSampling and UnderSampling (SMOTEENN):

Model Evaluation After Combining OverSampling and UnderSampling:

Conclusion:

About

Releases

Packages

Languages

Demon-2-Angel/Cereberal-Stroke-Analysis

Folders and files

Latest commit

History

Repository files navigation

Cereberal-Stroke-Analysis

Read Data:

Exploratory Data Analysis (EDA):

Handling Categorical Variables:

Handling Missing Values:

Feature Scaling and Train-Test Split:

Model Selection:

Model Evaluation Without Resampling:

OverSampling (SMOTE):

Model Evaluation After OverSampling:

UnderSampling:

Model Evaluation After UnderSampling:

Combining OverSampling and UnderSampling (SMOTEENN):

Model Evaluation After Combining OverSampling and UnderSampling:

Conclusion:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages