Credit Card Fraud Detetction

Credit Card Fraud Detection is the chosen graduation project idea in DEPI (Digital Egypt Pioneers Initiative) by the teammates.

Credit card fraud detection is a set of tools and protocols which card issuers use to detect suspicious activity that could indicate a fraud attempt. These tools are generally proactive, aiming to stop credit card fraud before it starts. They also help to prevent financial losses caused by credit card fraud

Why It's important?

Credit Card Fraud Detection helps to prevent financial losses caused by credit card fraud.

2- The Dataset:

The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset link and read more about it here.

3- Project WorkFlow

Project workflow is as follows:

Data was collected and downloaded from here.

Setting up the project worksapce on local pc:

unzipped the dataset and extracted the csv file from the directory
Initialized the local directory into a git directory using the command

    git init

making sure that: I'm in the correct directory using one of the two ways navigating using cd command followed by the path or directly from the shortcut menu use the git bash here.
rename the master branch into main branch using the command:

    git branch -m master main

created repository on GitHub have the same name as the problem and connected the local repo and the cloud repo together using

    git remote add origin 'url'

created the preprocessing notebook.

4- Preprocessing

Before feeding the data into the model directly, the data preprocessing step is done first. The used preprocessing techniques used:

Ensuring the data is clean (no null values, outliers, ...etc)
Checking the class imabalance (solved using sampling)
PCA was already applied on the data, so we ensured that all the values are normalized using the standard scaler for better training.
exported the cleansed data

1. Cleansing the data:

After importing the dataset in the notebook using pandas, we gained some insights about the dataset using .info() function as follows:

Only the target column is of type integer and the rest of the columns are of float type, also they are all have the same count (no missing values), you can check for null values using df.isnull().sum() to sum the null vaules of each column.
The dataset is too large, we have to check to class imabalnce to prevent bias:

After visualizing the calss balance, we found that one of two classes is too little to the other class, so a class imbalance problem is addressed here to be solved.
There are multiple techniques to resolve class imbalance: GANs, SMOTE, resampling, ...etc. We used resampling using scikit-learn library.
The data was resampled into small example (undersampling), because oversampling would make the data too large.

Another technique was applied is to augment the minor class using GANs:
- Augmentation using GAN: To avoid information loss, we applied Generative Adversarial Networks (GANs) to synthetically generate new fraud examples, thereby increasing the size of the fraud class to match the non-fraudulent transactions. While this technique provided more balanced data, it led to a large dataset, slowing down the computation due to the sheer number of non-fraud cases (over 200k).
- We just needed to prove a point using the GAN model, that it can be used to augment numerical data which is also processed before using PCA which added another challenge to GAN network, the idea was inspired from this blog: Fruad Detection WIth GANS

Name		Name	Last commit message	Last commit date
Latest commit History 81 Commits
.idea		.idea
__pycache__		__pycache__
images		images
logistic_regression_artifacts		logistic_regression_artifacts
mlruns		mlruns
random_forest_artifacts		random_forest_artifacts
templates		templates
.gitignore		.gitignore
Credit-Card-Fraud-Detection-Project-Presentation.pptx		Credit-Card-Fraud-Detection-Project-Presentation.pptx
CreditCardFraudDetection-FlaskApp.py		CreditCardFraudDetection-FlaskApp.py
CreditCardFraudDetectionPotofolio.pdf		CreditCardFraudDetectionPotofolio.pdf
DataPrepreocessing.ipynb		DataPrepreocessing.ipynb
LICENSE.txt		LICENSE.txt
README.md		README.md
app.py		app.py
azure.yml		azure.yml
cleansed_dataset.csv		cleansed_dataset.csv
confusion_matrix_1000.png		confusion_matrix_1000.png
confusion_matrix_1500.png		confusion_matrix_1500.png
confusion_matrix_2000.png		confusion_matrix_2000.png
confusion_matrix_2500.png		confusion_matrix_2500.png
confusion_matrix_3000.png		confusion_matrix_3000.png
confusion_matrix_3500.png		confusion_matrix_3500.png
confusion_matrix_4000.png		confusion_matrix_4000.png
confusion_matrix_4500.png		confusion_matrix_4500.png
confusion_matrix_500.png		confusion_matrix_500.png
confusion_matrix_5000.png		confusion_matrix_5000.png
confusion_matrix_RF_100.png		confusion_matrix_RF_100.png
confusion_matrix_RF_1000.png		confusion_matrix_RF_1000.png
confusion_matrix_RF_200.png		confusion_matrix_RF_200.png
confusion_matrix_RF_2000.png		confusion_matrix_RF_2000.png
confusion_matrix_RF_50.png		confusion_matrix_RF_50.png
confusion_matrix_RF_500.png		confusion_matrix_RF_500.png
fraud detection.ipynb		fraud detection.ipynb
mlflowRandomForest.py		mlflowRandomForest.py
mlflowlogistic.py		mlflowlogistic.py
model.ipynb		model.ipynb
model.pkl		model.pkl
model.py		model.py
model_deployment.ipynb		model_deployment.ipynb
pdf		pdf
project workflow.png		project workflow.png
requirements.txt		requirements.txt
startup.py		startup.py
testdata.json		testdata.json
testtarget.json		testtarget.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Credit Card Fraud Detetction

Table Of Contents:

1- About The Idea:

Why It's important?

2- The Dataset:

3- Project WorkFlow

Setting up the project worksapce on local pc:

4- Preprocessing

1. Cleansing the data:

About

Releases

Packages

Contributors 5

Languages

License

Sarah627/Credit-Card-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

Credit Card Fraud Detetction

Table Of Contents:

1- About The Idea:

Why It's important?

2- The Dataset:

3- Project WorkFlow

Setting up the project worksapce on local pc:

4- Preprocessing

1. Cleansing the data:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages