A machine learning model to predict heart disease using logistic regression. This project analyzes medical indicators to assist in early detection of potential heart conditions.
- Training Accuracy: 85.24%
- Test Accuracy: 80.49%
- Model Status: Good generalization with minimal overfitting
- Prediction Type: Binary classification (0: No Heart Disease, 1: Heart Disease)
- Source: Kaggle Heart Disease Dataset
- Records: 1025 entries
- Features: 14 (13 input features + 1 target)
- Distribution:
-
Healthy (0): 499 cases
-
Heart Disease (1): 526 cases
Feature Description Range/Type age Age of the patient 29-77 years sex Gender 0: Female, 1: Male cp Chest pain type 0-3 trestbps Resting blood pressure 94-200 mm Hg chol Serum cholesterol 126-564 mg/dl fbs Fasting blood sugar > 120 mg/dl 0: False, 1: True restecg Resting ECG results 0-2 thalach Maximum heart rate achieved 71-202 exang Exercise induced angina 0: No, 1: Yes oldpeak ST depression induced by exercise 0-6.2 slope Slope of peak exercise ST segment 0-2 ca Number of major vessels colored by fluoroscopy 0-4 thal Thalassemia 0-3
-
- Clone the repository:
git clone https://github.com/yourusername/heart-disease-prediction.git
cd heart-disease-prediction
- Create and activate virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install required packages:
pip install -r requirements.txt
import pandas as pd
from sklearn.model_selection import train_test_split
# Load data
heart_data = pd.read_csv('heart.csv')
# Split features and target
X = heart_data.drop(columns='target', axis=1)
Y = heart_data['target']
# Split into training and test sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, stratify=Y, random_state=2)
from sklearn.linear_model import LogisticRegression
# Initialize and train model
model = LogisticRegression()
model.fit(X_train, Y_train)
def predict_heart_disease(input_data):
# Example input: (43, 0, 0, 132, 341, 1, 0, 136, 1, 3, 1, 0, 3)
input_array = np.asarray(input_data).reshape(1, -1)
prediction = model.predict(input_array)
return "Heart Disease Detected" if prediction[0] == 1 else "No Heart Disease Detected"
# Model Evaluation
print(f'Training Accuracy: {accuracy_score(model.predict(X_train), Y_train):.2%}')
print(f'Testing Accuracy: {accuracy_score(model.predict(X_test), Y_test):.2%}')
- Implement feature scaling
- Add cross-validation
- Try different algorithms (Random Forest, SVM)
- Add feature importance analysis
- Include ROC curve analysis
- Add confusion matrix visualization
- Perform hyperparameter tuning
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Dataset provided by Kaggle
- Inspiration from various heart disease research papers
- scikit-learn documentation and community
Author - Vislavath Pavani ♡
Deployed Link: (https://mybinder.org/v2/gh/12pavani/Heart-Disease-Prediction-Model.git/main)](https://mybinder.org/v2/gh/12pavani/Heart-Disease-Prediction-Model/69879853fe0cf7be0cad78b563af53f007784901?urlpath=lab%2Ftree%2FHeart%20Disease%20Prediction.ipynb)