Credit Card Fraud Detetction

Credit Card Fraud Detection is the chosen graduation project idea in DEPI (Digital Egypt Pioneers Initiative) by the teammates.

Credit card fraud detection is a set of tools and protocols which card issuers use to detect suspicious activity that could indicate a fraud attempt. These tools are generally proactive, aiming to stop credit card fraud before it starts. They also help to prevent financial losses caused by credit card fraud

Why It's important?

Credit Card Fraud Detection helps to prevent financial losses caused by credit card fraud.

2- The Dataset:

The dataset contains transactions made by credit cards in September 2013 by European cardholders. This dataset presents transactions that occurred in two days, where we have 492 frauds out of 284,807 transactions. The dataset is highly unbalanced, the positive class (frauds) account for 0.172% of all transactions. The dataset link and read more about it here.

3- Project WorkFlow

Project workflow is as follows:

Data was collected and downloaded from here.

Setting up the project worksapce on local pc:

unzipped the dataset and extracted the csv file from the directory
Initialized the local directory into a git directory using the command

    git init

making sure that: I'm in the correct directory using one of the two ways navigating using cd command followed by the path or directly from the shortcut menu use the git bash here.
rename the master branch into main branch using the command:

    git branch -m master main

created repository on GitHub have the same name as the problem and connected the local repo and the cloud repo together using

    git remote add origin 'url'

created the preprocessing notebook.

4- Preprocessing

Before feeding the data into the model directly, the data preprocessing step is done first. The used preprocessing techniques used:

Ensuring the data is clean (no null values, outliers, ...etc)
Checking the class imabalance (solved using sampling)
PCA was already applied on the data, so we ensured that all the values are normalized using the standard scaler for better training.
exported the cleansed data

1. Cleansing the data:

After importing the dataset in the notebook using pandas, we gained some insights about the dataset using .info() function as follows:

Only the target column is of type integer and the rest of the columns are of float type, also they are all have the same count (no missing values), you can check for null values using df.isnull().sum() to sum the null vaules of each column.
The dataset is too large, we have to check to class imabalnce to prevent bias:

After visualizing the calss balance, we found that one of two classes is too little to the other class, so a class imbalance problem is addressed here to be solved.
There are multiple techniques to resolve class imbalance: GANs, SMOTE, resampling, ...etc. We used resampling using scikit-learn library.
The data was resampled into small example (undersampling), because oversampling would make the data too large.

Another technique was applied is to augment the minor class using GANs:
- Augmentation using GAN: To avoid information loss, we applied Generative Adversarial Networks (GANs) to synthetically generate new fraud examples, thereby increasing the size of the fraud class to match the non-fraudulent transactions. While this technique provided more balanced data, it led to a large dataset, slowing down the computation due to the sheer number of non-fraud cases (over 200k).
- We just needed to prove a point using the GAN model, that it can be used to augment numerical data which is also processed before using PCA which added another challenge to GAN network, the idea was inspired from this blog: Fruad Detection WIth GANS

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Credit Card Fraud Detetction

Table Of Contents:

1- About The Idea:

Why It's important?

2- The Dataset:

3- Project WorkFlow

Setting up the project worksapce on local pc:

4- Preprocessing

1. Cleansing the data:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Credit Card Fraud Detetction

Table Of Contents:

1- About The Idea:

Why It's important?

2- The Dataset:

3- Project WorkFlow

Setting up the project worksapce on local pc:

4- Preprocessing

1. Cleansing the data: