Password Strength Classification using Natural Language Processing

A Predictive Model for Classifying Passwords into Strong, Good, or Weak Categories to Enhance Password Security.

Project Overview

The Password Strength Classification project aims to enhance password security by developing a predictive model that classifies passwords into three categories: Strong, Good, and Weak. This classification helps users understand the strength of their passwords and mitigates the risk of breaches by encouraging the use of stronger passwords.

The key goals of this project are:

To provide an intuitive model for classifying passwords based on strength.
To analyze and clean data from an SQL database for high-quality input.
To apply advanced natural language processing techniques to enhance prediction accuracy.

Features

Password Classification: Classifies passwords into Strong, Good, or Weak categories based on learned features.
Data Visualization: Provides insights into password strength distribution and common characteristics of each category.
User-Friendly Interface: Allows users to input passwords and receive immediate feedback on strength classification.
Robust Data Analysis: Uses thorough data cleaning and transformation techniques to ensure accurate model training.

Tech Stack

Programming Language: Python
NLP Libraries: Scikit-learn, NLTK, Pandas
Data Processing: NumPy, SQLAlchemy
Machine Learning: Logistic Regression, TF-IDF (Term Frequency-Inverse Document Frequency)
Visualization: Matplotlib, Seaborn
Deployment: Flask (optional for web app), Jupyter Notebooks (for development)

Data Pipeline

Data Collection: Password data is collected from an SQL database, containing a diverse set of password samples.
Data Cleaning:
- Removed duplicates and irrelevant entries to ensure data quality.
- Handled missing values by imputing or removing incomplete records.
Data Transformation:
- Utilized the TF-IDF technique to convert password strings into numerical vectors for model training.
- Engineered additional features such as password length, character variety (uppercase, lowercase, numbers, symbols), and common patterns.
Exploratory Data Analysis (EDA):
- Visualized the distribution of password strengths and analyzed common characteristics of Strong, Good, and Weak passwords.
- Identified patterns that contribute to password strength.

Modeling

1. Logistic Regression Model

Chose Logistic Regression for its effectiveness in binary classification problems and its interpretability.
Trained the model using the TF-IDF transformed features and the corresponding strength labels.

2. Model Training:

Split the dataset into training and testing sets to validate model performance.
Performed hyperparameter tuning to optimize the model’s performance.

3. Model Validation:

Evaluated the model using cross-validation techniques to ensure generalization.
Assessed performance metrics on the test set to confirm model reliability.

Evaluation Metrics

The model is evaluated using the following metrics to ensure accurate password classification:

Accuracy: Measures the overall correctness of the model's predictions.
Precision: The ratio of true positive predictions to the total predicted positives, indicating the model's ability to identify Strong passwords.
Recall: The ratio of true positive predictions to the total actual positives, reflecting the model's ability to capture all Strong passwords.
F1 Score: The harmonic mean of precision and recall, providing a balanced measure of performance.

The Logistic Regression model achieved a high accuracy score, demonstrating its effectiveness in classifying password strengths accurately.

Setup Instructions

Prerequisites

Python 3.7+
Required libraries: Pandas, NumPy, Scikit-learn, NLTK, SQLAlchemy, Matplotlib, Seaborn

Installation

Clone the repository: git clone https://github.com/SamJoeSilvano/Password_Strength_Prediction_using_NLP.git
Navigate to the project directory: cd password-strength-classification
Install the dependencies: pip install -r requirements.txt
Run the Jupyter Notebook or Flask app (optional): jupyter notebook

or

python app.py

Usage

Load Data: Import password data from the SQL database.
Visualize Trends: Generate visualizations to understand password strength distribution.
Train Model: The Logistic Regression model is trained on the processed password data.
Classify Passwords: Input passwords into the model to receive classification as Strong, Good, or Weak.
Evaluate Model: Analyze performance metrics to ensure classification accuracy.

Future Enhancements

Advanced NLP Techniques: Explore advanced models such as recurrent neural networks (RNNs) or transformers for improved classification.
User Interface: Develop a more robust web application to allow users to test and visualize password strength interactively.
Real-time Feedback: Implement real-time password strength feedback as users create passwords.
Broader Dataset: Incorporate a wider range of password samples to enhance model robustness.

Contributing

Contributions are welcome! Here’s how you can help:

Fork the project.
Create a new feature branch (git checkout -b feature-branch).
Commit your changes (git commit -m 'Add new feature').
Push to the branch (git push origin feature-branch).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgements

Thanks to the open-source community for their invaluable libraries and resources that made this project possible.
Special recognition to the researchers and developers focused on enhancing password security.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Datasets		Datasets
.gitattributes		.gitattributes
NLP_Password_Prediction.ipynb		NLP_Password_Prediction.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Password Strength Classification using Natural Language Processing

Table of Contents

Project Overview

Features

Tech Stack

Data Pipeline

Modeling

1. Logistic Regression Model

2. Model Training:

3. Model Validation:

Evaluation Metrics

Setup Instructions

Prerequisites

Installation

Usage

Future Enhancements

Contributing

License

Acknowledgements

About

Releases

Packages

Languages

SamJoeSilvano/Password_Strength_Prediction_using_NLP

Folders and files

Latest commit

History

Repository files navigation

Password Strength Classification using Natural Language Processing

Table of Contents

Project Overview

Features

Tech Stack

Data Pipeline

Modeling

1. Logistic Regression Model

2. Model Training:

3. Model Validation:

Evaluation Metrics

Setup Instructions

Prerequisites

Installation

Usage

Future Enhancements

Contributing

License

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages