breast-cancer-logistic-regression

This repository demonstrates the use of Logistic Regression to classify breast cancer tumors as benign or malignant using the Breast Cancer Wisconsin (Original) dataset. It includes data exploration, model building with both full and subset feature sets, and performance evaluation using cross-validation.

Breast Cancer Classification using Logistic Regression

Overview

This project demonstrates how to use Logistic Regression to classify breast cancer tumors as either benign or malignant using the Breast Cancer Wisconsin (Original) dataset. The dataset consists of features derived from cell nuclei measurements that can be used to predict the nature of breast tumors. The model is built using Python's scikit-learn library.

Dataset

The dataset used in this project is the Breast Cancer Wisconsin (Original) dataset, made publicly available through the UCI Machine Learning Repository:

Citation:
Wolberg, W. (1990). Breast Cancer Wisconsin (Original) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5HP4Z

Features:

Clump Thickness
Uniformity of Cell Size
Uniformity of Cell Shape
Marginal Adhesion
Bare Nuclei
Bland Chromatin
Mitoses
and more.

Target:

2 - Benign
4 - Malignant

Project Structure

The project is organized as follows:

├── README.md              # Project Overview
├── breast_cancer.csv      # Dataset file
├── Breast Cancer.ipynb    # Jupyter notebook with analysis and model
#Objective
The goal of this project is to create a logistic regression model that accurately classifies tumors as benign or malignant. We explore both a model using all features and a model using a subset of key features for comparison.

#Key Steps:
Exploratory Data Analysis (EDA): We analyze the dataset, visualize relationships, and explore the distributions of features.
Feature Engineering: Feature selection based on correlation with the target class.
Model Training: Build logistic regression models using both the full feature set and a subset of important features.
Model Evaluation: Use cross-validation and confusion matrix to assess the performance, accuracy, and consistency of the models.

#Results
* We achieved the following results:

#Subset Model:

* Accuracy: 95.62%
* Cross-Validation Standard Deviation: 2.60%
#All Features Model:

* Accuracy: 96.70%
* Cross-Validation Standard Deviation: 1.97%
* The subset model performs almost as well as the full model while being simpler and more interpretable.

Dependencies
To run this project, install the required Python packages using:


numpy
pandas
matplotlib
seaborn
scikit-learn

#References
Wolberg, W. (1990). Breast Cancer Wisconsin (Original) [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5HP4Z

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
breast-cancer-log_reg		breast-cancer-log_reg
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

breast-cancer-logistic-regression

Breast Cancer Classification using Logistic Regression

Overview

Dataset

Features:

Target:

Project Structure

About

Releases

Packages

Languages

License

AArashev/breast-cancer-logistic-regression

Folders and files

Latest commit

History

Repository files navigation

breast-cancer-logistic-regression

Breast Cancer Classification using Logistic Regression

Overview

Dataset

Features:

Target:

Project Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages